Bases: BaseProcessor

Processor for Gemma database annotations.

Raw data must be downloaded before processing using metahq-build download gemma.

process(output_dir=PROCESSED_DIR, **kwargs)

Process Gemma annotations into standardized format.

Reads from the raw JSON file produced by metahq-build download gemma (default location: data/unprocessed/gemma.json). Raises ProcessorError if that file does not exist.

Parameters:
  • output_dir (Path, default: PROCESSED_DIR ) –

    Directory for processed output.

  • **kwargs (Any, default: {} ) –

    input_path (Path): override the raw JSON file location.

Returns:
  • DataFrame

    Standardized annotations with columns sample_id, annotation_type, term_id, term_label, and ecode.

Raises:

validate(data)

Validate processed Gemma data.

Parameters:
  • data (DataFrame) –

    Processed annotations to validate.

Returns:
  • bool

    True if validation passes.

Raises: