CellO

Bases: BaseProcessor

Processor for CellO automated cell type annotations.

CellO provides automated cell type predictions using the Cell Ontology (CL) for bulk RNA-seq samples. The raw source is a JSON object keyed by SRX accession with a list of CL term IDs as the value for each sample.

The processor explodes the per-sample term lists into one row per (sample, CL term), resolves each CL ID to a human-readable name via the CL OBO file, and drops any terms that cannot be resolved.

`process(output_dir, **kwargs)` ¶

Process the CellO JSON file into standardized sample annotations.

Reads the JSON source, builds one row per (sample, CL term), maps each CL ID to its human-readable label via the CL OBO, drops unmapped terms, and writes the result to a parquet file.

Parameters:	`output_dir` (`Path`) – Directory where the processed parquet file will be written. `kwargs`** (`Any`, default: `{}` ) – `input_path` (Path \| str) — override the default CellO JSON input path (defaults to `CELLO_JSON` from config).

Returns:	`DataFrame` – Standardized annotations with columns `COL_ACCESSION`, `annotation_type`, `term_id`, `term_label`, and `ecode`.

`validate(data)` ¶

Validate that processed CellO data meets minimum requirements.

Parameters:	`data` (`DataFrame`) – Processed annotations DataFrame to validate.

Returns:	`bool` – True if validation passes.

Raises:	`ValidationError` – If required columns are missing.

process(output_dir, **kwargs) ¶

validate(data) ¶

`process(output_dir, **kwargs)` ¶

`validate(data)` ¶