A class to store and operate on ID columns for tabular data. Specifically made as an index for polars.DataFrame objects which lack index anchoring and tracking.

Attributes:
  • data (DataFrame) –

    DataFrame containing ID columns (index, group, platform, etc.)

  • index_col (str) –

    Name of the column that contains the primary index IDs.

Examples:

>>> from metahq_core.curations.index import Ids
>>> ids = pl.DataFrame({
    "sample": ["GSM1", "GSM2", "GSM3"],
    "series": ["GSE1", "GSE1", "GSE2"],
    "platform": ["GPL10", "GPL10", "GPL23"],
    })
>>> ids = ids.from_dataframe(ids, index_col="sample")

columns property

Returns columns of self.data. Wrapper for polars.DataFrame.columns.

index property

Get the index column as a Series.

Examples:

>>> import polars as pl
>>> from metahq_core.curations.index import Ids
>>> ids = pl.DataFrame({
        "sample": ["GSM1", "GSM2", "GSM3"],
        "series": ["GSE1", "GSE1", "GSE2"],
        "platform": ["GPL10", "GPL10", "GPL23"],
    })
>>> Ids.from_dataframe(ids, index_col="sample")
shape: (3,)
Series: 'sample' [str]
[
        "GSM1"
        "GSM2"
        "GSM3"
]

filter_by_mask(mask)

Filter the ids DataFrame using a boolean mask.

Parameters:
  • mask (ndarray) –

    Array of indices to keep.

Examples:

>>> from metahq_core.curations.index import Ids
>>> ids = pl.DataFrame({
    "sample": ["GSM1", "GSM2", "GSM3"],
    "series": ["GSE1", "GSE1", "GSE2"],
    "platform": ["GPL10", "GPL10", "GPL23"],
    })
>>> ids = Ids.from_dataframe(ids, index_col="sample")
>>> ids.filter_by_mask(np.array([1, 2])).data
┌────────┬────────┬──────────┐
│ sample ┆ series ┆ platform │
│ ---    ┆ ---    ┆ ---      │
│ str    ┆ str    ┆ str      │
╞════════╪════════╪══════════╡
│ GSM2   ┆ GSE1   ┆ GPL10    │
│ GSM3   ┆ GSE2   ┆ GPL23    │
└────────┴────────┴──────────┘

lazy()

Wrapper for polars.DataFrame.lazy().

Returns:
  • LazyFrame

    A polars.LazyFrame object of the data attribute.

to_numpy()

Wrapper for polars.DataFrame.to_numpy().

Returns:
  • ndarray

    The data attribute as a numpy ndarray.

from_dataframe(df, index_col) classmethod

Creates an Ids object from a polars DataFrame.

Parameters:
  • df (DataFrame) –

    A polars.DataFrame object with at least one column.

  • index_col (str) –

    The name of the column in df that should be treated as the index of the DataFrame.

Returns:
  • Ids

    An initialized Ids object.

Examples:

>>> import polars as pl
>>> from metahq_core.curations.index import Ids
>>> ids = pl.DataFrame({
        "sample": ["GSM1", "GSM2", "GSM3"],
        "series": ["GSE1", "GSE1", "GSE2"],
        "platform": ["GPL10", "GPL10", "GPL23"],
    })
>>> Ids.from_dataframe(ids, index_col="sample")

__getitem__(idx)

Slice the Ids frame with various indexing methods.

__len__()

Return the number of rows.