xarray_beam.Rechunk¶
- class xarray_beam.Rechunk(dim_sizes, source_chunks, target_chunks, itemsize, min_mem=None, max_mem=1073741824)¶
Rechunk to an arbitrary new chunking scheme with bounded memory usage.
The approach taken here builds on Rechunker [1], but differs in two key ways:
It is performed via collective Beam operations, instead of writing intermediates arrays to disk.
It is performed collectively on full xarray.Dataset objects, instead of NumPy arrays.
[1] rechunker.readthedocs.io
- Parameters:
dim_sizes (Mapping[str, int])
source_chunks (Mapping[str, int | tuple[int, ...]])
target_chunks (Mapping[str, int | tuple[int, ...]])
itemsize (int)
min_mem (int | None)
max_mem (int)
- __init__(dim_sizes, source_chunks, target_chunks, itemsize, min_mem=None, max_mem=1073741824)[source]¶
Initialize Rechunk().
- Parameters:
dim_sizes (Mapping[str, int]) – size of the full (combined) dataset of all chunks.
source_chunks (Mapping[str, int | tuple[int, ...]]) – sizes of source chunks. Missing keys or values equal to -1 indicate “non-chunked” dimensions.
target_chunks (Mapping[str, int | tuple[int, ...]]) – sizes of target chunks, like source_keys. Keys must exactly match those found in source_chunks.
itemsize (int) – approximate number of bytes per xarray.Dataset element, after indexing out by all dimensions, e.g., 4 * len(dataset) for float32 data or roughly dataset.nbytes / np.prod(dataset.sizes).
min_mem (int | None) – minimum memory that a single intermediate chunk must consume.
max_mem (int) – maximum memory that a single intermediate chunk may consume.
Methods
__init__(dim_sizes, source_chunks, ...[, ...])Initialize Rechunk().
annotations()default_label()default_type_hints()display_data()Returns the display data associated to a pipeline component.
expand(pcoll)from_runner_api(proto, context)get_resource_hints()get_type_hints()Gets and/or initializes type hints for this object.
get_windowing(inputs)Returns the window function to be associated with transform's output.
infer_output_type(unused_input_type)register_urn(urn, parameter_type[, constructor])runner_api_requires_keyed_input()to_runner_api(context[, has_parts])to_runner_api_parameter(unused_context)to_runner_api_pickled(context)type_check_inputs(pvalueish)type_check_inputs_or_outputs(pvalueish, ...)type_check_outputs(pvalueish)with_input_types(input_type_hint)Annotates the input type of a
PTransformwith a type-hint.with_output_types(type_hint)Annotates the output type of a
PTransformwith a type-hint.with_resource_hints(**kwargs)Adds resource hints to the
PTransform.Attributes
labelpipelineside_inputs