xarray_beam.Dataset.to_zarr¶
- Dataset.to_zarr(path, *, zarr_chunks_per_shard=None, zarr_chunks=None, zarr_shards=None, zarr_format=None, stage_locally=None, label=None)[source]¶
Write this dataset to a Zarr file.
The extensive options for controlling chunking and sharding are intended for power users:
If you are happy with the existing chunk sizes of your dataset and just want to write it to disk, you can omit all of them.
Consider specifying only
zarr_chunks_per_shardto allow for more flexible efficient reading of data from disk. This allows for dividing dataset chunks into much smaller Zarr chunks on disk, with each chunk stored in a single Zarr shard.
- Parameters:
path (str) – path to write to.
zarr_chunks_per_shard (Mapping[str | EllipsisType, int] | None) – If provided, write this dataset into Zarr shards, each with at most this many Zarr chunks per shard (requires Zarr v3). Dimensions not included in
zarr_chunks_per_sharddefault to 1 chunk per shard, unless a dict key of ellipsis (…) is used to indicate a different default.zarr_chunks (Mapping[str | EllipsisType, int | str] | int | str | None) – Explicit chunk sizes to use for storing data in Zarr, as an alternative to specifying
zarr_chunks_per_shard. Zarr chunk sizes must evenly divide the existing chunk sizes of this dataset.zarr_shards (Mapping[str | EllipsisType, int | str] | int | str | None) – Explicit shards to use for storing data in Zarr, which must evenly divide the existing chunk sizes of this dataset, and be even multiples of chunk sizes. Requires Zarr v3. By default, Zarr sharding is not used unless
zarr_chunks_per_shardis provided, in which case Zarr shards default to the chunk sizes of this dataset.zarr_format (int | None) – optional integer specifying the explicit Zarr format to use. Defaults to Zarr v3 if using shards, or the default format for your installed version of Zarr.
stage_locally (bool | None) – If True, write Zarr metadata to a local temporary directory before copying to store in parallel. This can significantly speed up setup on high-latency filesystems. By default, uses local staging if possible, which is true as long as store is provided as as string or path.
label (str | None) – A unique name for this stage of the pipeline. Defaults to
None, in which case a name will be generated.
- Returns:
Beam transform that writes the dataset to a Zarr file.
- Return type:
PTransform | PCollection