Bases: BaseProcessor

Processor for Bgee database RNA-Seq library annotations.

Bgee is a database for gene expression patterns across multiple species, providing curated anatomical, developmental stage, and sex annotations for RNA-Seq libraries.

Processes data for 6 species: - Mus musculus (mouse) - Homo sapiens (human) - Rattus norvegicus (rat) - Caenorhabditis elegans (worm) - Danio rerio (zebrafish) - Drosophila melanogaster (fly)

process(output_dir, **kwargs)

Process Bgee RNA-Seq library data into standardized annotations.

Parameters:
  • output_dir (Path) –

    Directory where the processed parquet file will be written.

  • **kwargs (Any, default: {} ) –

    Optional species-specific file path overrides.

Returns:
  • DataFrame

    Standardized annotations with columns sample_id, annotation_type, term_id, term_label, and ecode.

validate(data)

Validate that processed Bgee data meets requirements.

Parameters:
  • data (DataFrame) –

    Processed annotations DataFrame to validate.

Returns:
  • bool

    True if validation passes.

Raises: