luplo.core.glossary¶
CRUD + query expansion for the glossary tables.
The glossary is luplo’s strict-first terminology layer. Terms are extracted
from items, normalised, and grouped. Approved groups power the search
pipeline’s query expansion — e.g. "vendor" expands to
(vendor | shop | NPC벤더).
Three tables: glossary_groups, glossary_terms, glossary_rejections.
No aggressive clustering — strict LLM matching only, with a human curation
queue for anything uncertain.
Functions¶
|
Create a glossary group (a semantic unit with one canonical term). |
|
Fetch a glossary group by ID or hex prefix (≥8 chars). |
|
List glossary groups, optionally filtering to those needing review. |
|
Create a glossary term (a surface form belonging to a group). |
|
List terms awaiting human curation. |
|
Approve a pending term into a group. |
|
Reject a term — the system will never re-propose this match. |
|
Merge source group into target — move all terms, delete source. |
|
Split a term out of its group into a new group. |
Create a glossary group AND its canonical surface term in one shot. |
|
|
Add a new surface term to an existing group. |
|
Permanently remove a glossary term. |
|
Look up glossary aliases for words in a single project. |
|
Expand a search query using the glossary (legacy, plain-word only). |
Module Contents¶
- async luplo.core.glossary.create_glossary_group(conn: psycopg.AsyncConnection[Any], *, project_id: str, canonical: str, definition: str | None = None, scope: str = 'project', scope_id: str | None = None, created_by: str | None = None, id: str | None = None) luplo.core.models.GlossaryGroup¶
Create a glossary group (a semantic unit with one canonical term).
- Parameters:
conn – Async psycopg connection.
project_id – Owning project.
canonical – The canonical surface form for this concept.
definition – Optional one-line definition.
scope – Scope level —
"project"(default) or"system".scope_id – System ID when scope is
"system".created_by – Actor who created this group.
id – Optional ID override.
- Returns:
The new
GlossaryGroup.
- async luplo.core.glossary.get_glossary_group(conn: psycopg.AsyncConnection[Any], group_id: str, *, project_id: str | None = None) luplo.core.models.GlossaryGroup | None¶
Fetch a glossary group by ID or hex prefix (≥8 chars).
Returns
Nonewhen nothing matches; raisesAmbiguousIdErrorwhen a prefix matches multiple groups. Pass project_id to scope prefix lookups.
- async luplo.core.glossary.list_glossary_groups(conn: psycopg.AsyncConnection[Any], project_id: str, *, needs_review: bool = False, limit: int = 100, offset: int = 0) list[luplo.core.models.GlossaryGroup]¶
List glossary groups, optionally filtering to those needing review.
needs_review=Truereturns groups that have pending terms.
- async luplo.core.glossary.create_glossary_term(conn: psycopg.AsyncConnection[Any], *, group_id: str | None, surface: str, normalized: str, is_protected: bool = False, status: str = 'pending', source_item_id: str | None = None, context_snippet: str | None = None, id: str | None = None) luplo.core.models.GlossaryTerm¶
Create a glossary term (a surface form belonging to a group).
- async luplo.core.glossary.list_pending_terms(conn: psycopg.AsyncConnection[Any], project_id: str, *, limit: int = 50) list[luplo.core.models.GlossaryTerm]¶
List terms awaiting human curation.
Includes both grouped pending terms (via group → project) and orphan pending terms (via source_item → project).
- async luplo.core.glossary.approve_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, group_id: str, actor_id: str, as_canonical: bool = False) luplo.core.models.GlossaryTerm | None¶
Approve a pending term into a group.
- Parameters:
conn – Async psycopg connection.
term_id – The term to approve.
group_id – Target group.
actor_id – Who approved.
as_canonical – If
True, set status to"canonical"(group should have at most one). Otherwise"alias".
- Returns:
The updated term, or
Noneif not found.
- async luplo.core.glossary.reject_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, actor_id: str, reason: str | None = None) luplo.core.models.GlossaryRejection | None¶
Reject a term — the system will never re-propose this match.
Sets the term’s status to
"rejected"and inserts a permanent record intoglossary_rejections.- Returns:
The rejection record, or
Noneif the term was not found.
- async luplo.core.glossary.merge_groups(conn: psycopg.AsyncConnection[Any], source_group_id: str, target_group_id: str, *, actor_id: str) luplo.core.models.GlossaryGroup | None¶
Merge source group into target — move all terms, delete source.
- Returns:
The target
GlossaryGroupafter merge, orNoneif either group was not found.
- async luplo.core.glossary.split_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, new_canonical: str, actor_id: str) luplo.core.models.GlossaryGroup | None¶
Split a term out of its group into a new group.
The term becomes the canonical member of the new group.
- Parameters:
conn – Async psycopg connection.
term_id – The term to split out.
new_canonical – Canonical name for the new group.
actor_id – Who performed the split.
- Returns:
The new
GlossaryGroup, orNoneif the term was not found.
- async luplo.core.glossary.create_glossary_group_with_canonical(conn: psycopg.AsyncConnection[Any], *, project_id: str, canonical: str, definition: str | None = None, created_by: str | None = None) tuple[luplo.core.models.GlossaryGroup, luplo.core.models.GlossaryTerm]¶
Create a glossary group AND its canonical surface term in one shot.
The CLI lp glossary group create flow assumes a group always has a canonical term — separating the two creation steps would let users leave the system in a broken intermediate state.
- Parameters:
conn – Async psycopg connection.
project_id – Owning project.
canonical – Canonical surface form. Becomes both the group’s
canonicalfield and the surface of the seeded term.definition – Optional one-line definition stored on the group.
created_by – Actor who created this.
- Returns:
Tuple of (group, canonical_term).
- async luplo.core.glossary.add_term_to_group(conn: psycopg.AsyncConnection[Any], group_id: str, *, surface: str, actor_id: str, as_canonical: bool = False) luplo.core.models.GlossaryTerm¶
Add a new surface term to an existing group.
Default status is
alias. When as_canonical is true the existing canonical (if any) is demoted toaliasfirst — there is at most one canonical per group.- Parameters:
conn – Async psycopg connection.
group_id – Target group (full UUID or ≥8 hex prefix).
surface – New surface form. Stored verbatim;
normalizedis the lowercased copy.actor_id – Who added this term.
as_canonical – Promote this term to canonical, demoting the current canonical to alias.
- Returns:
The newly created
GlossaryTerm.- Raises:
NotFoundError – If the group does not exist.
- async luplo.core.glossary.delete_glossary_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, actor_id: str) bool¶
Permanently remove a glossary term.
Cascade rule (option B): removing the last canonical/alias term in a group deletes the group as well, including any rejection records and any leftover pending/rejected terms in the same group. Removing the canonical while aliases still exist is refused — promote one alias to canonical first, or remove the aliases.
- Parameters:
conn – Async psycopg connection.
term_id – Term ID or hex prefix (≥8 chars).
actor_id – Who is removing the term (audit trail).
- Returns:
Trueif a term was removed,Falseif the term was not found.- Raises:
GlossaryGroupHasActiveTermsError – When removing the canonical would leave aliases without a canonical anchor.
- async luplo.core.glossary.fetch_glossary_map(conn: psycopg.AsyncConnection[Any], words: list[str], project_id: str) dict[str, list[str]]¶
Look up glossary aliases for words in a single project.
- Parameters:
conn – Async psycopg connection.
words – Lowercased words to look up.
project_id – Project scope for glossary lookup.
- Returns:
Mapping
lowercased_word → [surface1, surface2, ...]. Missing keys (no glossary hit) are simply absent from the mapping. Each value contains every approved surface in the matching group, including the lookup word itself.
- async luplo.core.glossary.expand_query(conn: psycopg.AsyncConnection[Any], query: str, project_id: str) str¶
Expand a search query using the glossary (legacy, plain-word only).
Each whitespace-delimited word in query is looked up; approved aliases are OR’d, groups are AND’d. This helper predates the web-search-style query dialect and does not understand phrases, negations, or
ORkeywords — pipeline callers should parse withluplo.core.search.tsquery.parse_user_query()and consultfetch_glossary_map()directly. Kept for backwards compatibility with any external caller.Example:
>>> await expand_query(conn, "vendor budget", "proj-1") "(vendor | shop | NPC벤더) & budget"