Bases: BaseProcessor
Processor for CellO automated cell type annotations.
CellO provides automated cell type predictions using the Cell Ontology (CL) for bulk RNA-seq samples. The raw source is a JSON object keyed by SRX accession with a list of CL term IDs as the value for each sample.
The processor explodes the per-sample term lists into one row per (sample, CL term), resolves each CL ID to a human-readable name via the CL OBO file, and drops any terms that cannot be resolved.
process(output_dir, **kwargs)
¶
Process the CellO JSON file into standardized sample annotations.
Reads the JSON source, builds one row per (sample, CL term), maps each CL ID to its human-readable label via the CL OBO, drops unmapped terms, and writes the result to a parquet file.
| Parameters: |
|
|---|
| Returns: |
|
|---|
validate(data)
¶
Validate that processed CellO data meets minimum requirements.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|