Bases: BaseProcessor
Processor for Golightly (2018) clinical sample annotations.
Reads a ZIP archive containing per-study clinical text files. Each file
has a fixed tissue assignment (from _FILE_METADATA) and optionally
age and sex columns. Age range strings (e.g. "40-50") are averaged
before being converted to an age group.
Process Golightly annotations into standardized format.
| Parameters: |
-
output_dir
(Path, default:
PROCESSED_DIR
)
–
Directory for processed output.
-
**kwargs
(Any, default:
{}
)
–
input_path (Path): override the ZIP file location.
|
| Returns: |
-
DataFrame
–
Standardized annotations with columns
sample_id, annotation_type, term_id,
term_label, and ecode.
|
Validate processed Golightly data.
| Parameters: |
-
data
(DataFrame)
–
Processed annotations to validate.
|
| Returns: |
-
bool
–
True if validation passes.
|
| Raises: |
-
ValidationError
–
If required columns are missing or tissue
annotations are absent.
|