Labels

Bases: BaseCuration

Class for storing and mutating labels.

Currently supports -1, 0, +1 labels.

Attributes:

data (DataFrame) –

Polars DataFrame with columns index, groups and columns for each attribute entity for each index (e.g. male or female, tissues, diseases, etc).
index_col (str) –

Name of the column of data that contains the index IDs.
group_cols (tuple[str, ...]) –

Names of columns of data that contain an ID for each index indicating if it belongs to a particular group (e.g. dataset, sex, platform, etc.).
collapsed (bool) –

Indicates if the annotations have already been collapsed.

`entities` `property` ¶

Returns column names of the Labels frame.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.entities
['UBERON:0000948', 'UBERON:0002113', 'UBERON:0000955']

`groups` `property` ¶

Returns the groups column of the Labels curation.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.groups
['GSE1', 'GSE1', 'GSE2']

`ids` `property` ¶

Return the IDs dataframe.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.ids
┌────────┬────────┐
│ sample ┆ series │
│ ---    ┆ ---    │
│ str    ┆ str    │
╞════════╪════════╡
│ GSM1   ┆ GSE1   │
│ GSM2   ┆ GSE1   │
│ GSM3   ┆ GSE2   │
└────────┴────────┘

`index` `property` ¶

Return the index column as a list.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.index
['GSM1', 'GSM2', 'GSM3']

`n_indices` `property` ¶

Returns number of indices.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.n_indices
3

`n_entities` `property` ¶

Returns number of entities.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
        'UBERON:0002107': [-1, -1, -1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.n_entities
4

`unique_groups` `property` ¶

Returns unique groups.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
>>> labels.unqiue_groups
['GSE1', 'GSE2']

`add_ids(new)` ¶

Append new group ID columns to the IDs of a Labels object. The new IDs must have a matching index.

Parameters:	`new` (`DataFrame`) – A DataFrame of additional IDs to join with the current index column of `data`. Must have a matching index column as the original `data`.

Returns:	`Labels` – A new Labels object including the new ID columns.

`drop(*args, **kwargs)` ¶

Wrapper for polars drop. Drops any of the term columns. ID columns are not dropped through this method.

`filter(condition)` ¶

Filter both data and ids simultaneously using a mask.

Parameters:	`condition` (`Expr`) – Polars expression for filtering columns.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col="sample", group_cols=["series"])
>>> labels.filter(pl.col("UBERON:0000948") == 1)
┌────────┬────────┬────────────────┬────────────────┬────────────────┐
│ sample ┆ series ┆ UBERON:0000948 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i32            ┆ i32            ┆ i32            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╡
│ GSM1   ┆ GSE1   ┆ 1              ┆ -1             ┆ -1             │
└────────┴────────┴────────────────┴────────────────┴────────────────┘

`head(*args, **kwargs)` ¶

Wrapper for polars head function.

`save(outfile, fmt, attribute, level, citation_config, metadata=None)` ¶

Save the labels curation.

Parameters:

outfile (str | Path) –

Path to outfile.json.
fmt (Literal['json', 'parquet', 'csv', 'tsv']) –

File format to save to.
attribute (str) –

A supported MetaHQ annotated attribute.
level (str) –

An index level supported by MetaHQ.
citation_config (CitationConfig) –

Parameters for saving citations.
metadata (str | None, default: None ) –

Metadata fields to inlcude formatted as a comma delimited string.

Examples:

If `metadata` is None, will only save the index column
with the remaining labels.

>>> from metahq_core.curations.labels import Labels
>>> from metahq_core.export.references import CitationConfig
>>> config = CitationConfig(
        '1.0.1', 'tissue', 'sample', 'human', 'expert', 'rnaseq', 'label', '2026-04-20'
    )
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(labels, index_col='sample', group_cols=['series'])
>>> labels.save(
        '/path/to/out.parquet', fmt="parquet", attribute="tissue", level="sample"
    )

`select(*args, **kwargs)` ¶

Select label entity columns while maintaining ids.

`slice(offset, length=None)` ¶

Slice both data and ids simultaneously using polars slice.

Parameters:	`offset` (`int`) – Index position to begin the slice. `length` (`int \| None`, default: `None` ) – Number of indices past `offset` to slice out.

Returns:	`Labels` – Sliced Labels object as a subset of the original Labels.

`to_numpy()` ¶

Wrapper for polars to_numpy.

`from_df(df, index_col, sources_col, group_cols, **kwargs)` `classmethod` ¶

Creates a Labels object from a combined DataFrame.

Attributes:

df (DataFrame) –

Polars DataFrame with index and group ID columns and columns for each attribute entity for each index (e.g. male or female, tissues, diseases, etc).
index_col (str) –

Name of the column of data that contains the index IDs.
group_cols (tuple[str, ...]) –

Names of columns of data that contain an ID for each index indicating if it belongs to a particular group (e.g. dataset, sex, platform, etc.).

Returns:	`Labels` – A Labels object constructed from `df`.

Examples:

>>> from metahq_core.curations.labels import Labels
>>> labels = {
        'sample': ['GSM1', 'GSM2', 'GSM3'],
        'series': ['GSE1', 'GSE1', 'GSE2'],
        'UBERON:0000948': [1, -1, -1],
        'UBERON:0002113': [-1, 1, -1],
        'UBERON:0000955': [-1, -1, 1],
    }
>>> labels = Labels.from_df(anno, index_col='sample', group_cols=['series'])
┌────────┬────────┬────────────────┬────────────────┬────────────────┐
│ sample ┆ series ┆ UBERON:0000948 ┆ UBERON:0002113 ┆ UBERON:0000955 │
│ ---    ┆ ---    ┆ ---            ┆ ---            ┆ ---            │
│ str    ┆ str    ┆ i64            ┆ i64            ┆ i64            │
╞════════╪════════╪════════════════╪════════════════╪════════════════╡
│ GSM1   ┆ GSE1   ┆ 1              ┆ -1             ┆ -1             │
│ GSM2   ┆ GSE1   ┆ -1             ┆ -1             ┆ -1             │
│ GSM3   ┆ GSE2   ┆ -1             ┆ -1             ┆ 1              │
└────────┴────────┴────────────────┴────────────────┴────────────────┘

entities property ¶

groups property ¶

ids property ¶

index property ¶

n_indices property ¶

n_entities property ¶

unique_groups property ¶

add_ids(new) ¶

drop(*args, **kwargs) ¶

filter(condition) ¶

head(*args, **kwargs) ¶

save(outfile, fmt, attribute, level, citation_config, metadata=None) ¶

select(*args, **kwargs) ¶

slice(offset, length=None) ¶

to_numpy() ¶

from_df(df, index_col, sources_col, group_cols, **kwargs) classmethod ¶

`entities` `property` ¶

`groups` `property` ¶

`ids` `property` ¶

`index` `property` ¶

`n_indices` `property` ¶

`n_entities` `property` ¶

`unique_groups` `property` ¶

`add_ids(new)` ¶

`drop(*args, **kwargs)` ¶

`filter(condition)` ¶

`head(*args, **kwargs)` ¶

`save(outfile, fmt, attribute, level, citation_config, metadata=None)` ¶

`select(*args, **kwargs)` ¶

`slice(offset, length=None)` ¶

`to_numpy()` ¶

`from_df(df, index_col, sources_col, group_cols, **kwargs)` `classmethod` ¶