Bases: BaseAnnotationCombiner
Combines annotations from SRA-based sources, mapping accession IDs to GSM.
Run-level (xxR) IDs are first resolved to experiment-level (xxX) IDs via
src_sra_runs, then both xxR and xxX IDs are mapped to GEO sample IDs
(GSM) via src_geo_samples in the OmicIDX DuckDB database.
Example:
>>> combiner = SraCombiner()
>>> combiner.combine().clean().save(SRA_COMBINED_BSON)
combine(db_path=OMICIDX_DB, overrides=None)
¶
Load and combine all SRA source parquets, mapping IDs to GSM.
Sources whose parquet file does not exist are skipped with a warning. Within each source, rows whose accession ID cannot be mapped to a GSM are dropped and counted.
| Parameters: |
|
|---|
| Returns: |
|
|---|