metahq-build combine¶
Combine processed annotations from multiple sources into a BSON file.
Run metahq-build process for each source before combining.
Usage:
Options:
geo¶
Combine all GEO-based source annotations into a single BSON file.
Reads processed parquets from data/processed/ for each GEO source (ale, cello, creeds, disign_atlas, gemma, golightly, gu, johnson_2023, krishnanlab, sirota_2011, ursa, ursahd). Missing sources are skipped.
Examples:
metahq-build combine geo
metahq-build combine geo --output /data/geo_combined.bson
Usage:
Options:
-o, --output PATH Output BSON file path (default:
data/processed/geo_combined.bson)
--help Show this message and exit.
sample¶
Merge GEO and SRA combined annotations into a single sample-level BSON.
Both GEO and SRA BSONs must already be keyed by GSM. Run 'metahq-build combine geo' and 'metahq-build combine sra' first.
Accession IDs (series, platform, srx, srp) are enriched from OmicIDX for every sample in the combined database.
Examples:
metahq-build combine sample
metahq-build combine sample --output /data/combined__level-sample.bson
metahq-build combine sample --geo /data/geo.bson --sra /data/sra.bson
metahq-build combine sample --metadata-db /data/omicidx.duckdb
Usage:
Options:
-o, --output PATH Output BSON file path (default:
data/processed/combined__level-sample.bson)
--geo PATH Path to GEO combined BSON (default:
data/processed/geo_combined.bson)
--sra PATH Path to SRA combined BSON (default:
data/processed/sra_combined.bson)
--metadata-db PATH Path to OmicIDX DuckDB file (default:
data/omicidx.duckdb)
--specific BOOLEAN Apply to filter for specific annotations.
--help Show this message and exit.
series¶
Usage:
Options:
-o, --output PATH Output BSON file path (default:
data/processed/combined__level-series.bson)
--sample PATH Path to sample combined BSON (default:
data/processed/combined__level-sample.bson)
--specific BOOLEAN Apply to filter for specific annotations.
--help Show this message and exit.
sra¶
Combine all SRA-based source annotations into a single BSON file.
Reads processed parquets from data/processed/ for each SRA source (bgee, johnson_2023_rnaseq). SRR/SRX accession IDs are mapped to GSM IDs via the OmicIDX DuckDB database. Missing sources are skipped.
Examples:
metahq-build combine sra
metahq-build combine sra --output /data/sra_combined.bson
metahq-build combine sra --metadata-db /data/omicidx.duckdb
Usage:
Options: