Bases: BaseCuration

Class for storing and mutating labels.

Currently supports -1, 0, +1 labels.

Attributes:
  • data (DataFrame) –

    Polars DataFrame with columns index, groups and columns for each attribute entity for each index (e.g. male or female, tissues, diseases, etc).

  • index_col (str) –

    Name of the column of data that contains the index IDs.

  • group_cols (tuple[str, ...]) –

    Names of columns of data that contain an ID for each index indicating if it belongs to a particular group (e.g. dataset, sex, platform, etc.).

  • collapsed (bool) –

    Indicates if the annotations have already been collapsed.

entities property

Returns column names of the Labels frame.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.entities
['UBERON:0000948', 'UBERON:0002113', 'UBERON:0000955']

groups property

Returns the groups column of the Labels curation.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.groups
['GSE1', 'GSE1', 'GSE2']

ids property

Return the IDs dataframe.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.ids
┌────────┬────────┐
│ sample ┆ series │
│ ---    ┆ ---    │
│ str    ┆ str    │
╞════════╪════════╡
│ GSM1   ┆ GSE1   │
│ GSM2   ┆ GSE1   │
│ GSM3   ┆ GSE2   │
└────────┴────────┘

index property

Return the index column as a list.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.index
['GSM1', 'GSM2', 'GSM3']

n_indices property

Returns number of indices.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.n_indices
3

n_entities property

Returns number of entities.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
        'UBERON:0002107': [-1, -1, -1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.n_entities
4

unique_groups property

Returns unique groups.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.unqiue_groups
['GSE1', 'GSE2']

add_ids(new)

Append new group ID columns to the IDs of a Labels object. The new IDs must have a matching index.

Parameters:
  • new (DataFrame) –

    A DataFrame of additional IDs to join with the current index column of data. Must have a matching index column as the original data.

Returns:
  • Labels

    A new Labels object including the new ID columns.

drop(*args, **kwargs)

Wrapper for polars drop. Drops any of the term columns. ID columns are not dropped through this method.

filter(condition)

Filter both data and ids simultaneously using a mask.

Parameters:
  • condition (Expr) –

    Polars expression for filtering columns.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col="sample", group_cols=["series"])
>>> labels.filter(pl.col("UBERON:0000948") == 1)
┌────────┬────────┬────────────────┬────────────────┬────────────────┐
│ sample ┆ series ┆ UBERON:0000948 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i32            ┆ i32            ┆ i32            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╡
│ GSM1   ┆ GSE1   ┆ 1              ┆ -1             ┆ -1             │
└────────┴────────┴────────────────┴────────────────┴────────────────┘

head(*args, **kwargs)

Wrapper for polars head function.

save(outfile, fmt, attribute, level, citation_config, metadata=None)

Save the labels curation.

Parameters:
  • outfile (str | Path) –

    Path to outfile.json.

  • fmt (Literal['json', 'parquet', 'csv', 'tsv']) –

    File format to save to.

  • attribute (str) –

    A supported MetaHQ annotated attribute.

  • level (str) –

    An index level supported by MetaHQ.

  • citation_config (CitationConfig) –

    Parameters for saving citations.

  • metadata (str | None, default: None ) –

    Metadata fields to inlcude formatted as a comma delimited string.

Examples:

If `metadata` is None, will only save the index column
with the remaining labels.

>>> from metahq_core.curations.labels import Labels
>>> from metahq_core.export.references import CitationConfig
>>> config = CitationConfig(
        '1.0.1', 'tissue', 'sample', 'human', 'expert', 'rnaseq', 'label', '2026-04-20'
    )
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(labels, index_col='sample', group_cols=['series'])
>>> labels.save(
        '/path/to/out.parquet', fmt="parquet", attribute="tissue", level="sample"
    )

select(*args, **kwargs)

Select label entity columns while maintaining ids.

slice(offset, length=None)

Slice both data and ids simultaneously using polars slice.

Parameters:
  • offset (int) –

    Index position to begin the slice.

  • length (int | None, default: None ) –

    Number of indices past offset to slice out.

Returns:
  • Labels

    Sliced Labels object as a subset of the original Labels.

to_numpy()

Wrapper for polars to_numpy.

from_df(df, index_col, sources_col, group_cols, **kwargs) classmethod

Creates a Labels object from a combined DataFrame.

Attributes:
  • df (DataFrame) –

    Polars DataFrame with index and group ID columns and columns for each attribute entity for each index (e.g. male or female, tissues, diseases, etc).

  • index_col (str) –

    Name of the column of data that contains the index IDs.

  • group_cols (tuple[str, ...]) –

    Names of columns of data that contain an ID for each index indicating if it belongs to a particular group (e.g. dataset, sex, platform, etc.).

Returns:
  • Labels

    A Labels object constructed from df.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
┌────────┬────────┬────────────────┬────────────────┬────────────────┐
│ sample ┆ series ┆ UBERON:0000948 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i64            ┆ i64            ┆ i64            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╡
│ GSM1   ┆ GSE1   ┆ 1              ┆ -1             ┆ -1             │
│ GSM2   ┆ GSE1   ┆ -1             ┆ -1             ┆ -1             │
│ GSM3   ┆ GSE2   ┆ -1             ┆ -1             ┆ 1              │
└────────┴────────┴────────────────┴────────────────┴────────────────┘