Base class for building a combined annotation dict from processor outputs.
Subclasses implement combine() to load source data and call
add_source() for each. The clean() → save() workflow is
defined here and shared across all combiners.
| Attributes: |
|
|---|
add_source(source_name, data)
¶
Add annotations from a standard-schema DataFrame.
Rows are grouped by (COL_ACCESSION, COL_ATTRIBUTE). Multiple term
IDs and labels for the same group are joined with DELIMITER. The ecode
of the first row in the group is used (processors produce a single
ecode per source).
| Parameters: |
|
|---|
clean(specific=False, uberon_relations=UBERON_RELATIONS, mondo_relations=MONDO_RELATIONS, uberon_systems=UBERON_SYSTEMS, mondo_systems=MONDO_SYSTEMS)
¶
Remove empty and undesired annotation entries.
Drops source entries where every value is in UNDESIRED or where
the only key remaining after filtering is ecode. Drops
entries that have no substantive annotations after cleaning.
| Parameters: |
|
|---|
| Returns: |
|
|---|
save(output_path)
¶
Save the combined annotation dict to a BSON file.
| Parameters: |
|
|---|