xarray_beam.normalize_chunks¶

xarray_beam.normalize_chunks(chunks, template, split_vars=False, previous_chunks=None)[source]¶

Normalize chunks for a xarray.Dataset.

This function interprets various chunk specifications (e.g., integer sizes or numbers of bytes) and returns a dictionary mapping dimension names to concrete integer chunk sizes. It uses dask.array.api.normalize_chunks under the hood.

Chunk specifications for each dimension can be one of the following:

-1: along this dimension chunks should be the full size of the dimension.
An integer: the exact chunk size for this dimension.
A byte-string (e.g., “64MiB”, “1GB”): indicates that dask should pick chunk sizes to aim for chunks of approximately this size.

Only a single string value indicating a number of bytes can be specified. To indicate that chunking applies to multiple dimensions, use a dict key of ....

Some examples:

chunks={'time': 100}: Each chunk will have exactly 100 elements along the ‘time’ dimension.
chunks="200MB": Create chunks that are approximately 200MB in size.
chunks={'time': -1, ...: "100MB"}: Chunks should include the full ‘time’ dimension, and be chunked along other dimensions such that resulting chunks are approximately 100MiB in size.

Parameters:

chunks (Mapping[str | EllipsisType, int | str] | int | str) – The desired chunking scheme. Can either be a dictionary mapping dimension names to chunk sizes, or a single string/integer chunk specification (e.g., ‘100MB’) to be applied as the default for all dimensions. Dimensions not included in the dictionary default to previous_chunks (if available) or the full size of the dimension. A dict key of ellipsis (…) can also be used to indicate “all other dimensions”.
template (Dataset) – An xarray.Dataset providing dimension sizes and dtype information, used for calculating chunk sizes in bytes.
split_vars (bool) – If True, chunk size limits are applied per-variable, based on the largest variable’s dtype. If False, limits are applied to chunks containing all variables, based on the sum of dtypes for all variables.
previous_chunks (Mapping[str, int] | None) – If provided, hints to dask that chunks should be multiples of previous_chunks, if possible.

Returns:

A dictionary mapping all dimension names to integer chunk sizes.

Return type:

dict[str, int]