Given a query string, return the top k hits from the ontology search index.

The search index is built from the ontology terms' names and synonyms, where names are weighted more heavily than synonyms. The search uses the BM25+ algorithm to rank the results.

Parameters:
  • query (str) –

    The query string.

  • db (Path | None, default: None ) –

    Path to the DuckDB database file, or None to use the default location.

  • k (int, default: 20 ) –

    The number of top hits to return.

  • type (str | None, default: None ) –

    If given, restrict results to this type (e.g. "celltype", "disease", or "tissue").

  • ontology (str | None, default: None ) –

    If given, restrict results to this ontology (e.g. "CL", "UBERON", or "MONDO").

  • verbose (bool, default: False ) –

    If True, print debug information.

Returns:
  • DataFrame

    A polars.DataFrame object with columns: term_id, ontology, name, type, synonyms, score.

Build the doc_text column for BM25 indexing from the name and synonyms.

See the NAME_WEIGHT and SCOPE_WEIGHTS constants for how parts of the record are weighted in the resulting document.

Per the OBO 1.4 spec, synonyms can have scopes in {EXACT, BROAD, NARROW, RELATED}. If no scope is given, it is treated as RELATED.

Parameters:
  • name (str) –

    The primary name of the term.

  • syns (list[SynonymEntry]) –

    List of {"text": str, "scope": str|None} synonym entries.

Returns:
  • str

    A string suitable for BM25 indexing.

Bases: TypedDict

Storage of synonyms and their scope.

Attributes:
  • text (str) –

    Any piece of text.

  • scope (NotRequired[Literal['EXACT', 'NARROW', 'BROAD', 'RELATED']]) –

    The importance of text.