Progress tracking utilities for long-running operations.

Provides progress bars and tracking for pipeline stages, data processing, and other operations that process large amounts of data.

ProgressTracker

Wraps tqdm to provide consistent progress tracking across the package.

Attributes:
  • bar (tqdm) –

    Underlying tqdm progress bar

  • desc (str) –

    Description of the operation being tracked

__init__(total=None, desc='Processing', unit='items', disable=False)

Initialize progress tracker.

Parameters:
  • total (int | None, default: None ) –

    Total number of items to process. None for unknown total

  • desc (str, default: 'Processing' ) –

    Description to display with the progress bar

  • unit (str, default: 'items' ) –

    Unit of items being processed (e.g., "files", "samples")

  • disable (bool, default: False ) –

    If True, disable progress bar (useful for silent mode)

update(n=1)

Update progress by n items.

Parameters:
  • n (int, default: 1 ) –

    Number of items to increment progress by

set_description(desc)

Update the description text.

Parameters:
  • desc (str) –

    New description to display

set_postfix(**kwargs)

Set postfix statistics to display.

Parameters:
  • **kwargs (Any, default: {} ) –

    Key-value pairs to display (e.g., errors=5, warnings=12)

close()

Close the progress bar.

__enter__()

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)

Context manager exit.

StageProgress

Track progress through multiple pipeline stages.

Attributes:
  • stages (list[str]) –

    Names of all pipeline stages

  • current_stage_idx (int) –

    Index of the current stage

  • disable (bool) –

    Whether to disable progress output

__init__(stages, disable=False)

Initialize stage progress tracker.

Parameters:
  • stages (list[str]) –

    List of stage names in order

  • disable (bool, default: False ) –

    If True, disable progress output

start()

Start tracking overall pipeline progress.

start_stage(stage_name, total_items=None)

Start a new pipeline stage.

Parameters:
  • stage_name (str) –

    Name of the stage being started

  • total_items (int | None, default: None ) –

    Number of items to process in this stage

update_stage(n=1)

Update progress within the current stage.

Parameters:
  • n (int, default: 1 ) –

    Number of items to increment by

end_stage()

Complete the current stage and move to the next.

finish()

Finish tracking and close all progress bars.

__enter__()

Context manager entry.

__exit__(exc_type, exc_val, exc_tb)

Context manager exit.

track_progress(iterable, desc='Processing', total=None, unit='items', disable=False)

Wrap an iterable with a progress bar.

Parameters:
  • iterable (Iterable) –

    Items to iterate over

  • desc (str, default: 'Processing' ) –

    Description to display with the progress bar

  • total (int | None, default: None ) –

    Total number of items. If None, tries to determine from iterable

  • unit (str, default: 'items' ) –

    Unit of items being processed

  • disable (bool, default: False ) –

    If True, disable progress bar

Returns:
  • Iterable

    Wrapped iterable with progress tracking

Examples:

>>> for item in track_progress(items, desc="Processing samples"):
...     process(item)
>>> files = ["file1.txt", "file2.txt", "file3.txt"]
>>> for file in track_progress(files, desc="Reading files", unit="files"):
...     read_file(file)

parallel_progress(func, items, desc='Processing', n_workers=4, disable=False)

Process items in parallel with a progress bar.

Parameters:
  • func (Callable) –

    Function to apply to each item

  • items (list) –

    List of items to process

  • desc (str, default: 'Processing' ) –

    Description for the progress bar

  • n_workers (int, default: 4 ) –

    Number of parallel workers

  • disable (bool, default: False ) –

    If True, disable progress bar

Returns:
  • list

    Results from processing each item

Examples:

>>> def process_sample(sample_id):
...     # Process sample
...     return result
>>> results = parallel_progress(
...     process_sample,
...     sample_ids,
...     desc="Processing samples",
...     n_workers=8
... )