Bases: BaseProcessor

Processor for Golightly (2018) clinical sample annotations.

Reads a ZIP archive containing per-study clinical text files. Each file has a fixed tissue assignment (from _FILE_METADATA) and optionally age and sex columns. Age range strings (e.g. "40-50") are averaged before being converted to an age group.

process(output_dir=PROCESSED_DIR, **kwargs)

Process Golightly annotations into standardized format.

Parameters:
  • output_dir (Path, default: PROCESSED_DIR ) –

    Directory for processed output.

  • **kwargs (Any, default: {} ) –

    input_path (Path): override the ZIP file location.

Returns:
  • DataFrame

    Standardized annotations with columns sample_id, annotation_type, term_id, term_label, and ecode.

Raises:

validate(data)

Validate processed Golightly data.

Parameters:
  • data (DataFrame) –

    Processed annotations to validate.

Returns:
  • bool

    True if validation passes.

Raises:
  • ValidationError

    If required columns are missing or tissue annotations are absent.