Bases: BaseProcessor

Processor for ALE (Giles et al.) manual annotations.

ALE provides expert-curated annotations for GEO samples. Tissue terms are stored as BTO IDs and mapped to UBERON/CL using a helper file. Age in months is converted to years and then to a discrete age group.

process(output_dir=PROCESSED_DIR, **kwargs)

Process ALE annotations into standardized format.

Reads the raw ALE TSV, maps BTO tissue IDs to UBERON/CL via the helper CSV, maps sex codes to PATO terms, and converts age in months to age group labels.

Parameters:
  • output_dir (Path, default: PROCESSED_DIR ) –

    Directory for processed output.

  • **kwargs (Any, default: {} ) –

    input_path (Path): override the raw TSV location. bto_uberon_path (Path): override the BTO→UBERON map location.

Returns:
  • DataFrame

    Standardized annotations with columns sample_id, annotation_type, term_id, term_label, and ecode.

validate(data)

Validate processed ALE data.

Parameters:
  • data (DataFrame) –

    Processed annotations to validate.

Returns:
  • bool

    True if validation passes.

Raises:
  • ValidationError

    If required columns are missing or tissue annotations are absent.