luplo.core.glossary =================== .. py:module:: luplo.core.glossary .. autoapi-nested-parse:: CRUD + query expansion for the glossary tables. The glossary is luplo's strict-first terminology layer. Terms are extracted from items, normalised, and grouped. Approved groups power the search pipeline's query expansion — e.g. ``"vendor"`` expands to ``(vendor | shop | NPC벤더)``. Three tables: ``glossary_groups``, ``glossary_terms``, ``glossary_rejections``. No aggressive clustering — strict LLM matching only, with a human curation queue for anything uncertain. Functions --------- .. autoapisummary:: luplo.core.glossary.create_glossary_group luplo.core.glossary.get_glossary_group luplo.core.glossary.list_glossary_groups luplo.core.glossary.create_glossary_term luplo.core.glossary.list_pending_terms luplo.core.glossary.approve_term luplo.core.glossary.reject_term luplo.core.glossary.merge_groups luplo.core.glossary.split_term luplo.core.glossary.create_glossary_group_with_canonical luplo.core.glossary.add_term_to_group luplo.core.glossary.delete_glossary_term luplo.core.glossary.fetch_glossary_map luplo.core.glossary.expand_query Module Contents --------------- .. py:function:: create_glossary_group(conn: psycopg.AsyncConnection[Any], *, project_id: str, canonical: str, definition: str | None = None, scope: str = 'project', scope_id: str | None = None, created_by: str | None = None, id: str | None = None) -> luplo.core.models.GlossaryGroup :async: Create a glossary group (a semantic unit with one canonical term). :param conn: Async psycopg connection. :param project_id: Owning project. :param canonical: The canonical surface form for this concept. :param definition: Optional one-line definition. :param scope: Scope level — ``"project"`` (default) or ``"system"``. :param scope_id: System ID when scope is ``"system"``. :param created_by: Actor who created this group. :param id: Optional ID override. :returns: The new ``GlossaryGroup``. .. py:function:: get_glossary_group(conn: psycopg.AsyncConnection[Any], group_id: str, *, project_id: str | None = None) -> luplo.core.models.GlossaryGroup | None :async: Fetch a glossary group by ID or hex prefix (≥8 chars). Returns ``None`` when nothing matches; raises :class:`AmbiguousIdError` when a prefix matches multiple groups. Pass *project_id* to scope prefix lookups. .. py:function:: list_glossary_groups(conn: psycopg.AsyncConnection[Any], project_id: str, *, needs_review: bool = False, limit: int = 100, offset: int = 0) -> list[luplo.core.models.GlossaryGroup] :async: List glossary groups, optionally filtering to those needing review. ``needs_review=True`` returns groups that have pending terms. .. py:function:: create_glossary_term(conn: psycopg.AsyncConnection[Any], *, group_id: str | None, surface: str, normalized: str, is_protected: bool = False, status: str = 'pending', source_item_id: str | None = None, context_snippet: str | None = None, id: str | None = None) -> luplo.core.models.GlossaryTerm :async: Create a glossary term (a surface form belonging to a group). .. py:function:: list_pending_terms(conn: psycopg.AsyncConnection[Any], project_id: str, *, limit: int = 50) -> list[luplo.core.models.GlossaryTerm] :async: List terms awaiting human curation. Includes both grouped pending terms (via group → project) and orphan pending terms (via source_item → project). .. py:function:: approve_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, group_id: str, actor_id: str, as_canonical: bool = False) -> luplo.core.models.GlossaryTerm | None :async: Approve a pending term into a group. :param conn: Async psycopg connection. :param term_id: The term to approve. :param group_id: Target group. :param actor_id: Who approved. :param as_canonical: If ``True``, set status to ``"canonical"`` (group should have at most one). Otherwise ``"alias"``. :returns: The updated term, or ``None`` if not found. .. py:function:: reject_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, actor_id: str, reason: str | None = None) -> luplo.core.models.GlossaryRejection | None :async: Reject a term — the system will never re-propose this match. Sets the term's status to ``"rejected"`` and inserts a permanent record into ``glossary_rejections``. :returns: The rejection record, or ``None`` if the term was not found. .. py:function:: merge_groups(conn: psycopg.AsyncConnection[Any], source_group_id: str, target_group_id: str, *, actor_id: str) -> luplo.core.models.GlossaryGroup | None :async: Merge source group into target — move all terms, delete source. :returns: The target ``GlossaryGroup`` after merge, or ``None`` if either group was not found. .. py:function:: split_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, new_canonical: str, actor_id: str) -> luplo.core.models.GlossaryGroup | None :async: Split a term out of its group into a new group. The term becomes the canonical member of the new group. :param conn: Async psycopg connection. :param term_id: The term to split out. :param new_canonical: Canonical name for the new group. :param actor_id: Who performed the split. :returns: The new ``GlossaryGroup``, or ``None`` if the term was not found. .. py:function:: create_glossary_group_with_canonical(conn: psycopg.AsyncConnection[Any], *, project_id: str, canonical: str, definition: str | None = None, created_by: str | None = None) -> tuple[luplo.core.models.GlossaryGroup, luplo.core.models.GlossaryTerm] :async: Create a glossary group AND its canonical surface term in one shot. The CLI `lp glossary group create` flow assumes a group always has a canonical term — separating the two creation steps would let users leave the system in a broken intermediate state. :param conn: Async psycopg connection. :param project_id: Owning project. :param canonical: Canonical surface form. Becomes both the group's ``canonical`` field and the surface of the seeded term. :param definition: Optional one-line definition stored on the group. :param created_by: Actor who created this. :returns: Tuple of (group, canonical_term). .. py:function:: add_term_to_group(conn: psycopg.AsyncConnection[Any], group_id: str, *, surface: str, actor_id: str, as_canonical: bool = False) -> luplo.core.models.GlossaryTerm :async: Add a new surface term to an existing group. Default status is ``alias``. When *as_canonical* is true the existing canonical (if any) is demoted to ``alias`` first — there is at most one canonical per group. :param conn: Async psycopg connection. :param group_id: Target group (full UUID or ≥8 hex prefix). :param surface: New surface form. Stored verbatim; ``normalized`` is the lowercased copy. :param actor_id: Who added this term. :param as_canonical: Promote this term to canonical, demoting the current canonical to alias. :returns: The newly created ``GlossaryTerm``. :raises NotFoundError: If the group does not exist. .. py:function:: delete_glossary_term(conn: psycopg.AsyncConnection[Any], term_id: str, *, actor_id: str) -> bool :async: Permanently remove a glossary term. Cascade rule (option B): removing the last canonical/alias term in a group deletes the group as well, including any rejection records and any leftover pending/rejected terms in the same group. Removing the canonical while aliases still exist is refused — promote one alias to canonical first, or remove the aliases. :param conn: Async psycopg connection. :param term_id: Term ID or hex prefix (≥8 chars). :param actor_id: Who is removing the term (audit trail). :returns: ``True`` if a term was removed, ``False`` if the term was not found. :raises GlossaryGroupHasActiveTermsError: When removing the canonical would leave aliases without a canonical anchor. .. py:function:: fetch_glossary_map(conn: psycopg.AsyncConnection[Any], words: list[str], project_id: str) -> dict[str, list[str]] :async: Look up glossary aliases for *words* in a single project. :param conn: Async psycopg connection. :param words: Lowercased words to look up. :param project_id: Project scope for glossary lookup. :returns: Mapping ``lowercased_word → [surface1, surface2, ...]``. Missing keys (no glossary hit) are simply absent from the mapping. Each value contains every approved surface in the matching group, including the lookup word itself. .. py:function:: expand_query(conn: psycopg.AsyncConnection[Any], query: str, project_id: str) -> str :async: Expand a search query using the glossary (legacy, plain-word only). Each whitespace-delimited word in *query* is looked up; approved aliases are OR'd, groups are AND'd. This helper predates the web-search-style query dialect and does not understand phrases, negations, or ``OR`` keywords — pipeline callers should parse with :func:`luplo.core.search.tsquery.parse_user_query` and consult :func:`fetch_glossary_map` directly. Kept for backwards compatibility with any external caller. Example:: >>> await expand_query(conn, "vendor budget", "proj-1") "(vendor | shop | NPC벤더) & budget"