Downloads study annotations from the Gemma REST API.

Fetches in batches of QUERY_LIMIT studies, saves each batch to a temporary directory, then combines them into a single JSON file with the structure {batch_index: [study, ...]}.

Attributes:
  • BASE_URL (str) –

    Gemma v2 REST API endpoint for dataset queries.

  • QUERY_LIMIT (int) –

    Number of studies per API request (per Gemma API documentation).

fetch(output_path=GEMMA_RAW, query='sort=-id', max_studies=60000)

Download Gemma annotations and save to a JSON file.

Fetches studies in batches of QUERY_LIMIT, writes each batch to a temporary directory, then combines all non-empty batches into a single JSON file at output_path. The temporary directory is always cleaned up, even on failure.

Parameters:
  • output_path (Path, default: GEMMA_RAW ) –

    Destination file path for the combined JSON output. Defaults to the package-wide GEMMA_RAW constant.

  • query (str, default: 'sort=-id' ) –

    Gemma API query string appended to the base URL (e.g. "sort=-id").

  • max_studies (int, default: 60000 ) –

    Upper bound on the number of studies to download.

Returns:
  • Path

    Path to the saved JSON file.

Raises:
  • HTTPError

    If any batch request returns a non-2xx status.