Orchestrates the complete MetaHQ database build pipeline.

Runs processing for all enabled sources in alphabetical order, then combines GEO, SRA, and sample annotations. Supports checkpoint-based resumption so a failed run can be restarted from the last completed stage.

All pipeline behaviour is driven from a DataPackageConfig loaded from metahq_build.yaml.

run(start_from=None, end_at=None)

Execute all pipeline stages in order with checkpointing.

Already-completed stages (recorded in the checkpoint file) are skipped automatically unless the stage has use_checkpoint: false in config. Stages with skip: true in config are always bypassed. Use start_from to ignore everything before a named stage, or end_at to stop after a named stage.

Parameters:
  • start_from (str | None, default: None ) –

    Stage name to resume from. All earlier stages are skipped regardless of checkpoint state.

  • end_at (str | None, default: None ) –

    Stage name to stop after. Later stages are not executed.