luplo.core.search.tsquery

User-facing query parser and to_tsquery composer.

The pipeline accepts a small web-search-style dialect:

  • plain word — required term (AND’d with its siblings)

  • "exact phrase" — phrase match (<-> in tsquery)

  • -word or -"phrase" — negation (! in tsquery)

  • OR (literal, uppercase) between two tokens — disjunction

The grammar is intentionally tiny. Nested parentheses, regex, fuzzy modifiers, and other embellishments are out of scope — see the philosophy doc for the five refusals, of which “honesty over coverage” directly forbids operator surface creep.

Glossary expansion is applied only to required and OR-group terms. Phrases and negated tokens pass through literally:

  • Expanding inside a phrase would break the exact-sequence semantic.

  • Expanding a negated term would re-include (via the alias) the concept the user explicitly excluded — silently undoing the negation.

The output of build_tsquery() is a string safe for to_tsquery('simple', ...). Empty input maps to the empty string so callers can short-circuit without a DB round-trip.

Attributes

Classes

Term

A single tokenised term from a user query.

OrGroup

Two or more terms joined by the literal OR keyword.

Functions

parse_user_query(→ list[Clause])

Tokenise query into AND-joined clauses.

build_tsquery(→ str)

Render clauses into a to_tsquery('simple', ...) compatible string.

Module Contents

class luplo.core.search.tsquery.Term

A single tokenised term from a user query.

text: str

Raw word, or space-joined words if this is a phrase.

phrase: bool = False

True when the source token was wrapped in "...".

negated: bool = False

True when the source token was prefixed with -.

class luplo.core.search.tsquery.OrGroup

Two or more terms joined by the literal OR keyword.

members: tuple[Term, Ellipsis]
luplo.core.search.tsquery.Clause
luplo.core.search.tsquery.parse_user_query(query: str) list[Clause]

Tokenise query into AND-joined clauses.

The returned clauses are meant to be AND’d together at the SQL level. An OrGroup represents a maximal run of A OR B OR C between AND-joined peers.

Malformed inputs degrade gracefully — an unbalanced quote is treated as a literal character inside the surrounding word. Unknown operator combinations become plain required terms.

Parameters:

query – Raw user string.

Returns:

List of clauses in source order. Empty list when the input is whitespace-only.

luplo.core.search.tsquery.build_tsquery(clauses: list[Clause], glossary_map: dict[str, list[str]] | None = None) str

Render clauses into a to_tsquery('simple', ...) compatible string.

The composition rule:

  • Term with phrase=Truew1 <-> w2 <-> ...

  • Term with negated=True! <rendered inner> (negated terms never receive glossary expansion)

  • plain Term → a glossary-expanded disjunction (or the single term when no aliases exist)

  • OrGroup( <m1> | <m2> | ... ), each member glossary-expanded if eligible

Parameters:
  • clauses – Output of parse_user_query().

  • glossary_map – Lowercased word → list of surface aliases. Missing keys mean “no expansion”. None is treated as an empty map.

Returns:

A string safe to drop into to_tsquery('simple', %s). Empty when clauses is empty.