xarray_beam.Dataset.from_ptransform

classmethod Dataset.from_ptransform(ptransform, *, template, chunks, split_vars=False, label=None)[source]

Create an xarray_beam.Dataset from a Beam PTransform.

This is an advanced constructor that allows you to create an xarray_beam.Dataset from an existing Beam PTransform that produces (Key, xarray.Dataset) pairs.

The PTransform should produce chunks that conform to the given template, chunks, and split_vars arguments. This constructor will add a validation step to the PTransform to normalize keys into the strictest possible form based on the other arguments, and ensure that transform outputs are valid.

Parameters:
  • ptransform (PTransform | PCollection) – A Beam collection of (Key, xarray.Dataset) pairs. You only need to set offsets on these keys, vars will be automatically set based on the dataset if split_vars is True.

  • template (Dataset) – An xarray.Dataset object representing the schema (coordinates, dimensions, data variables, and attributes) of the full dataset, as produced by xarray_beam.make_template(), with data variables backed by Dask arrays.

  • chunks (Mapping[str | EllipsisType, int]) – A dictionary mapping dimension names to integer chunk sizes. Every chunk produced by ptransform must have dimensions of these sizes, except for the last chunk in each dimension, which may be smaller.

  • split_vars (bool) – A boolean indicating whether the chunks in ptransform are split across variables, or if each chunk contains all variables.

  • label (str | None) – A unique name for this stage of the pipeline. Defaults to None, in which case a name will be generated.

Returns:

An xarray_beam.Dataset instance wrapping the PTransform.

Return type:

Dataset