metahq-build combine

Combine processed annotations from multiple sources into a BSON file.

Run metahq-build process for each source before combining.

Usage:

metahq-build combine [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

geo

Combine all GEO-based source annotations into a single BSON file.

Reads processed parquets from data/processed/ for each GEO source (ale, cello, creeds, disign_atlas, gemma, golightly, gu, johnson_2023, krishnanlab, sirota_2011, ursa, ursahd). Missing sources are skipped.

Examples:

metahq-build combine geo
metahq-build combine geo --output /data/geo_combined.bson

Usage:

metahq-build combine geo [OPTIONS]

Options:

  -o, --output PATH  Output BSON file path (default:
                     data/processed/geo_combined.bson)
  --help             Show this message and exit.

sample

Merge GEO and SRA combined annotations into a single sample-level BSON.

Both GEO and SRA BSONs must already be keyed by GSM. Run 'metahq-build combine geo' and 'metahq-build combine sra' first.

Accession IDs (series, platform, srx, srp) are enriched from OmicIDX for every sample in the combined database.

Examples:

metahq-build combine sample
metahq-build combine sample --output /data/combined__level-sample.bson
metahq-build combine sample --geo /data/geo.bson --sra /data/sra.bson
metahq-build combine sample --metadata-db /data/omicidx.duckdb

Usage:

metahq-build combine sample [OPTIONS]

Options:

  -o, --output PATH   Output BSON file path (default:
                      data/processed/combined__level-sample.bson)
  --geo PATH          Path to GEO combined BSON (default:
                      data/processed/geo_combined.bson)
  --sra PATH          Path to SRA combined BSON (default:
                      data/processed/sra_combined.bson)
  --metadata-db PATH  Path to OmicIDX DuckDB file (default:
                      data/omicidx.duckdb)
  --specific BOOLEAN  Apply to filter for specific annotations.
  --help              Show this message and exit.

series

Usage:

metahq-build combine series [OPTIONS]

Options:

  -o, --output PATH   Output BSON file path (default:
                      data/processed/combined__level-series.bson)
  --sample PATH       Path to sample combined BSON (default:
                      data/processed/combined__level-sample.bson)
  --specific BOOLEAN  Apply to filter for specific annotations.
  --help              Show this message and exit.

sra

Combine all SRA-based source annotations into a single BSON file.

Reads processed parquets from data/processed/ for each SRA source (bgee, johnson_2023_rnaseq). SRR/SRX accession IDs are mapped to GSM IDs via the OmicIDX DuckDB database. Missing sources are skipped.

Examples:

metahq-build combine sra
metahq-build combine sra --output /data/sra_combined.bson
metahq-build combine sra --metadata-db /data/omicidx.duckdb

Usage:

metahq-build combine sra [OPTIONS]

Options:

  -o, --output PATH   Output BSON file path (default:
                      data/processed/sra_combined.bson)
  --metadata-db PATH  Path to OmicIDX DuckDB file (default:
                      data/omicidx.duckdb)
  --help              Show this message and exit.