LongAnnotations

Annotations in long format.

Exists to support modularity and readibility within the Query class.

Attributes:	`annotations` (`DataFrame`) – DataFrame with columns storing accession IDs with an `id` and `value` column storing multiple annotations for a single entry.

`column_intersection_with(columns)` ¶

Find intersection between columns and the columns in the annotations attribute.

Parameters:	`columns` (`list[str]`) – Any list of potential columns in the DataFrame.

Returns:	`list[str]` – The intersection of columns.

`filter_na(column)` ¶

Removes entries in a column that are NA-like values (e.g., 'NA' or 'none'). Updates the annotations attribute in place.

Parameters:	`column` (`str`) – The name of a column in the DataFrame.

`stage_anchor(anchor)` ¶

Filters NA values from the anchor annotations column.

Parameters:	`anchor` (`Literal['id', 'value']`) – The column storing desired format of annotations.

`stage_level(level)` ¶

Filters NA values from the specified ID level column. If level is 'group', then it will also remove annotations with index IDs.

Parameters:	`level` (`Literal['sample', 'series']`) – Annotation level.

`stage(level, anchor)` ¶

Stages the annotations DataFrame to be converted to wide format. Mutates the annotations attribute in place.

Parameters:	`level` (`Literal['sample', 'series']`) – Annotation level. `anchor` (`Literal['id', 'value']`) – The column storing desired format of annotations.

`pivot_wide(level, anchor, id_cols)` ¶

Pivots the to wide annotations with one-hot-encoded binary entries for each annotation.

Parameters:	`level` (`Literal['sample', 'series']`) – Annotation level. `anchor` (`Literal['id', 'value']`) – The column storing desired format of annotations. `id_cols` (`list[str]`) – Columns to keep as IDs when pivoting.

Returns:	`DataFrame` – Annotations in one-hot-encoded wide format with the accession IDs for each annotation.

Examples:

>>> from metahq_core.query import LongAnnotations
>>> anno = pl.DataFrame({
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'platform': ['GPL1', 'GPL2', 'GPL2'],
        'id': ['UBERON:0000948|UBERON:0002349', 'UBERON:0002113', 'UBERON:0000955'],
        'value': ['heart|myocardium', 'kidney', 'brain'],
    })
>>> anno = LongAnnotations(anno)
>>> anno.pivot_wide(
        level='sample', anchor='id', id_cols=['sample', 'series']
    )
┌────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┐
│ series ┆ sample ┆ UBERON:0000948 ┆ UBERON:0002349 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i32            ┆ i32            ┆ i32            ┆ i32            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╪════════════════╡
│ GSE1   ┆ GSM1   ┆ 1              ┆ 1              ┆ 0              ┆ 0              │
│ GSE1   ┆ GSM2   ┆ 0              ┆ 0              ┆ 1              ┆ 0              │
│ GSE2   ┆ GSM3   ┆ 0              ┆ 0              ┆ 0              ┆ 1              │
└────────┴────────┴────────────────┴────────────────┴────────────────┴────────────────┘

column_intersection_with(columns) ¶

filter_na(column) ¶

stage_anchor(anchor) ¶

stage_level(level) ¶

stage(level, anchor) ¶