Annotations in long format.

Exists to support modularity and readibility within the Query class.

Attributes:
  • annotations (DataFrame) –

    DataFrame with columns storing accession IDs with an id and value column storing multiple annotations for a single entry.

column_intersection_with(columns)

Find intersection between columns and the columns in the annotations attribute.

Parameters:
  • columns (list[str]) –

    Any list of potential columns in the DataFrame.

Returns:
  • list[str]

    The intersection of columns.

filter_na(column)

Removes entries in a column that are NA-like values (e.g., 'NA' or 'none'). Updates the annotations attribute in place.

Parameters:
  • column (str) –

    The name of a column in the DataFrame.

stage_anchor(anchor)

Filters NA values from the anchor annotations column.

Parameters:
  • anchor (Literal['id', 'value']) –

    The column storing desired format of annotations.

stage_level(level)

Filters NA values from the specified ID level column. If level is 'group', then it will also remove annotations with index IDs.

Parameters:
  • level (Literal['sample', 'series']) –

    Annotation level.

stage(level, anchor)

Stages the annotations DataFrame to be converted to wide format. Mutates the annotations attribute in place.

Parameters:
  • level (Literal['sample', 'series']) –

    Annotation level.

  • anchor (Literal['id', 'value']) –

    The column storing desired format of annotations.

pivot_wide(level, anchor, id_cols)

Pivots the to wide annotations with one-hot-encoded binary entries for each annotation.

Parameters:
  • level (Literal['sample', 'series']) –

    Annotation level.

  • anchor (Literal['id', 'value']) –

    The column storing desired format of annotations.

  • id_cols (list[str]) –

    Columns to keep as IDs when pivoting.

Returns:
  • DataFrame

    Annotations in one-hot-encoded wide format with the accession IDs for each annotation.

Examples:

>>> from metahq_core.query import LongAnnotations
>>> anno = pl.DataFrame({
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'platform': ['GPL1', 'GPL2', 'GPL2'],
        'id': ['UBERON:0000948|UBERON:0002349', 'UBERON:0002113', 'UBERON:0000955'],
        'value': ['heart|myocardium', 'kidney', 'brain'],
    })
>>> anno = LongAnnotations(anno)
>>> anno.pivot_wide(
        level='sample', anchor='id', id_cols=['sample', 'series']
    )
┌────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┐
│ series ┆ sample ┆ UBERON:0000948 ┆ UBERON:0002349 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i32            ┆ i32            ┆ i32            ┆ i32            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╪════════════════╡
│ GSE1   ┆ GSM1   ┆ 1              ┆ 1              ┆ 0              ┆ 0              │
│ GSE1   ┆ GSM2   ┆ 0              ┆ 0              ┆ 1              ┆ 0              │
│ GSE2   ┆ GSM3   ┆ 0              ┆ 0              ┆ 0              ┆ 1              │
└────────┴────────┴────────────────┴────────────────┴────────────────┴────────────────┘