xarray_beam.normalize_chunks¶
- xarray_beam.normalize_chunks(chunks, template, split_vars=False, previous_chunks=None)[source]¶
Normalize chunks for a xarray.Dataset.
This function interprets various chunk specifications (e.g., integer sizes or numbers of bytes) and returns a dictionary mapping dimension names to concrete integer chunk sizes. It uses
dask.array.api.normalize_chunksunder the hood.- Chunk specifications for each dimension can be one of the following:
-1: along this dimension chunks should be the full size of the dimension.An integer: the exact chunk size for this dimension.
A byte-string (e.g., “64MiB”, “1GB”): indicates that dask should pick chunk sizes to aim for chunks of approximately this size.
Only a single string value indicating a number of bytes can be specified. To indicate that chunking applies to multiple dimensions, use a dict key of
....- Some examples:
chunks={'time': 100}: Each chunk will have exactly 100 elements along the ‘time’ dimension.chunks="200MB": Create chunks that are approximately 200MB in size.chunks={'time': -1, ...: "100MB"}: Chunks should include the full ‘time’ dimension, and be chunked along other dimensions such that resulting chunks are approximately 100MiB in size.
- Parameters:
chunks (Mapping[str | EllipsisType, int | str] | int | str) – The desired chunking scheme. Can either be a dictionary mapping dimension names to chunk sizes, or a single string/integer chunk specification (e.g., ‘100MB’) to be applied as the default for all dimensions. Dimensions not included in the dictionary default to
previous_chunks(if available) or the full size of the dimension. A dict key of ellipsis (…) can also be used to indicate “all other dimensions”.template (Dataset) – An xarray.Dataset providing dimension sizes and dtype information, used for calculating chunk sizes in bytes.
split_vars (bool) – If True, chunk size limits are applied per-variable, based on the largest variable’s dtype. If False, limits are applied to chunks containing all variables, based on the sum of dtypes for all variables.
previous_chunks (Mapping[str, int] | None) – If provided, hints to dask that chunks should be multiples of
previous_chunks, if possible.
- Returns:
A dictionary mapping all dimension names to integer chunk sizes.
- Return type:
dict[str, int]