Base class for data source processors.
Defines the interface that all data source processors must implement to ensure consistency across the pipeline.
ProcessorError
¶
Bases: Exception
Exception raised when processor encounters an error.
ValidationError
¶
Bases: Exception
Exception raised when processor validation fails.
BaseProcessor
¶
Bases: ABC
Abstract base class for all data source processors.
All data source processors must inherit from this class and implement the required methods. This ensures a consistent interface across all processors in the pipeline.
| Attributes: |
|
|---|
__init__()
¶
Initialize the base processor.
process(output_dir, **kwargs)
abstractmethod
¶
Process raw data into standardized annotation format.
The output DataFrame must have the following columns: - accession: str - Sample or study identifier (GSM, GSE, SRR, etc.) - attribute: str - Type of annotation (tissue, disease, cell_type, sex, age) - term_id: str - Ontology term ID (e.g., MONDO:0004994, UBERON:0000948) - term_name: str - Human-readable term label - ecode: str - Evidence code (expert, semi, crowd, automated)
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
validate(data)
abstractmethod
¶
Validate that processed data meets requirements.
Checks that the DataFrame has the required columns and that values are in the expected format.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
cleanup(temp_dir)
¶
Clean up temporary files after processing.
| Parameters: |
|
|---|
run(output_dir=PROCESSED_DIR, validate_output=True, **kwargs)
¶
Run the complete processor workflow: process, validate.
| Parameters: |
|
|---|
| Returns: |
|
|---|
| Raises: |
|
|---|
__repr__()
¶
String representation of processor.