Configuration schemas for metahq-build pipeline.
Defines Pydantic models for validating pipeline configuration.
SampleAnnotationEntry
¶
Bases: BaseModel
Configuration for a sample-level entry from a source in MetaHQ.
| Attributes: |
|
|---|
SampleTissueAnnotationEntry
¶
Bases: SampleAnnotationEntry
Configuration for sample-level tissue annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has UBERON or CL annotations.
SampleDiseaseAnnotationEntry
¶
Bases: SampleAnnotationEntry
Configuration for sample-level disease annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has MONDO annotations.
SampleSexAnnotationEntry
¶
Bases: SampleAnnotationEntry
Configuration for sample-level sex annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has valid sex ID annotations.
SampleAgeAnnotationEntry
¶
Bases: SampleAnnotationEntry
Configuration for sample-level age annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has valid age group ID annotations.
SeriesAnnotationEntry
¶
Bases: BaseModel
Configuration for a series-level entry from a source in MetaHQ.
| Attributes: |
|
|---|
SeriesTissueAnnotationEntry
¶
Bases: SeriesAnnotationEntry
Configuration for series-level tissue annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has UBERON or CL annotations.
SeriesDiseaseAnnotationEntry
¶
Bases: SeriesAnnotationEntry
Configuration for series-level disease annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has MONDO annotations.
SeriesSexAnnotationEntry
¶
Bases: SeriesAnnotationEntry
Configuration for series-level sex annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has valid sex ID annotations.
SeriesAgeAnnotationEntry
¶
Bases: SeriesAnnotationEntry
Configuration for series-level age annotation entries.
| Attributes: |
|
|---|
validate_id(v)
classmethod
¶
Ensure an entry ID has valid age group ID annotations.
SampleAccessionIDs
¶
Bases: BaseModel
Configuration for accession IDs for a sample entry.
| Attributes: |
|
|---|
validate_sample_prefix(v)
classmethod
¶
Ensure all sample IDs start with GSM.
validate_series_prefix(v)
classmethod
¶
Ensure all series IDs start with GSE.
validate_platform_prefix(v)
classmethod
¶
Ensure all platform IDs start with GPL.
validate_xxx_prefix(v)
classmethod
¶
Ensure all SRX IDs start with SRX, ERX, or DRX.
validate_xxs_prefix(v)
classmethod
¶
Ensure all SRS IDs start with SRS, ERS, or DRS.
validate_xxp_prefix(v)
classmethod
¶
Ensure all SRP IDs start with SRP, ERP, or DRP.
SeriesAccessionIDs
¶
Bases: BaseModel
Configuration for accession IDs for a sample entry.
| Attributes: |
|
|---|
SampleEntry
¶
Bases: BaseModel
Configuration for a sample entry in the database.
| Attributes: |
|
|---|
verify_organism(v)
classmethod
¶
Check that the organism for a particular entry is valid.
SeriesEntry
¶
Bases: BaseModel
Configuration for a series entry in the database.
| Attributes: |
|
|---|
verify_organism(v)
classmethod
¶
Check that the organism for a particular entry is valid.
ProcessorConfig
¶
Bases: BaseModel
Configuration for a single data source processor.
| Attributes: |
|
|---|
OntologyConfig
¶
Bases: BaseModel
Configuration for ontology processing.
| Attributes: |
|
|---|
PipelineStageConfig
¶
Bases: BaseModel
Configuration for a pipeline stage.
| Attributes: |
|
|---|
ParallelConfig
¶
Bases: BaseModel
Configuration for parallel processing.
| Attributes: |
|
|---|
ValidationConfig
¶
Bases: BaseModel
Configuration for data validation.
| Attributes: |
|
|---|
PipelineConfig
¶
Bases: BaseModel
Main pipeline configuration.
| Attributes: |
|
|---|
FileEntry
¶
Bases: BaseModel
A source → destination file mapping in the data package structure.
DataPackageConfig
¶
Bases: BaseModel
Full configuration for the MetaHQ setup pipeline, driven by metahq_build.yaml.
| Attributes: |
|
|---|
data_package_path
property
¶
Return the full path to the data package directory.
md5_path
property
¶
Return the full path to the data package directory.
from_yaml(file)
classmethod
¶
Load and validate config from metahq_build.yaml.
Flattens params keys into the top level and resolves
{output_dir}/{package_name} placeholders in structure destinations.
get_processor_config(name)
¶
Return config for a named processor, falling back to defaults.
get_stage_config(name)
¶
Return config for a named pipeline stage, falling back to defaults.
verify_source_files()
¶
Ensure every source file listed in the structure exists.
Logs all missing files before exiting so the user can fix them all at once.
create_directories()
¶
Create all required pipeline directories.