luplo.core.search.tsquery ========================= .. py:module:: luplo.core.search.tsquery .. autoapi-nested-parse:: User-facing query parser and ``to_tsquery`` composer. The pipeline accepts a small web-search-style dialect: - plain ``word`` — required term (AND'd with its siblings) - ``"exact phrase"`` — phrase match (``<->`` in tsquery) - ``-word`` or ``-"phrase"`` — negation (``!`` in tsquery) - ``OR`` (literal, uppercase) between two tokens — disjunction The grammar is intentionally tiny. Nested parentheses, regex, fuzzy modifiers, and other embellishments are out of scope — see the philosophy doc for the five refusals, of which "honesty over coverage" directly forbids operator surface creep. Glossary expansion is applied only to **required** and **OR-group** terms. Phrases and negated tokens pass through literally: - Expanding inside a phrase would break the exact-sequence semantic. - Expanding a negated term would re-include (via the alias) the concept the user explicitly excluded — silently undoing the negation. The output of :func:`build_tsquery` is a string safe for ``to_tsquery('simple', ...)``. Empty input maps to the empty string so callers can short-circuit without a DB round-trip. Attributes ---------- .. autoapisummary:: luplo.core.search.tsquery.Clause Classes ------- .. autoapisummary:: luplo.core.search.tsquery.Term luplo.core.search.tsquery.OrGroup Functions --------- .. autoapisummary:: luplo.core.search.tsquery.parse_user_query luplo.core.search.tsquery.build_tsquery Module Contents --------------- .. py:class:: Term A single tokenised term from a user query. .. py:attribute:: text :type: str Raw word, or space-joined words if this is a phrase. .. py:attribute:: phrase :type: bool :value: False True when the source token was wrapped in ``"..."``. .. py:attribute:: negated :type: bool :value: False True when the source token was prefixed with ``-``. .. py:class:: OrGroup Two or more terms joined by the literal ``OR`` keyword. .. py:attribute:: members :type: tuple[Term, Ellipsis] .. py:data:: Clause .. py:function:: parse_user_query(query: str) -> list[Clause] Tokenise *query* into AND-joined clauses. The returned clauses are meant to be AND'd together at the SQL level. An :class:`OrGroup` represents a maximal run of ``A OR B OR C`` between AND-joined peers. Malformed inputs degrade gracefully — an unbalanced quote is treated as a literal character inside the surrounding word. Unknown operator combinations become plain required terms. :param query: Raw user string. :returns: List of clauses in source order. Empty list when the input is whitespace-only. .. py:function:: build_tsquery(clauses: list[Clause], glossary_map: dict[str, list[str]] | None = None) -> str Render *clauses* into a ``to_tsquery('simple', ...)`` compatible string. The composition rule: - :class:`Term` with ``phrase=True`` → ``w1 <-> w2 <-> ...`` - :class:`Term` with ``negated=True`` → ``! `` (negated terms never receive glossary expansion) - plain :class:`Term` → a glossary-expanded disjunction (or the single term when no aliases exist) - :class:`OrGroup` → ``( | | ... )``, each member glossary-expanded if eligible :param clauses: Output of :func:`parse_user_query`. :param glossary_map: Lowercased word → list of surface aliases. Missing keys mean "no expansion". ``None`` is treated as an empty map. :returns: A string safe to drop into ``to_tsquery('simple', %s)``. Empty when *clauses* is empty.