xarray_beam.Dataset

class xarray_beam.Dataset(template, chunks, split_vars, ptransform)

Experimental high-level representation of an Xarray-Beam dataset.

Parameters:
  • template (xarray.Dataset)

  • chunks (Mapping[str, int])

  • split_vars (bool)

  • ptransform (beam.PTransform | beam.PCollection | _LazyPCollection)

__init__(template, chunks, split_vars, ptransform)[source]

Low level interface for creating a new Dataset, without validation.

Unless you’re really sure you don’t need validation, prefer using xarray_beam.Dataset.from_ptransform.

Parameters:
  • template (Dataset) – xarray.Dataset describing the structure of this dataset, typically as produced by xarray_beam.make_template().

  • chunks (Mapping[str, int]) – mapping from dimension names to chunk sizes. For normalization, use xarray_beam.normalize_chunks().

  • split_vars (bool) – whether variables are split between separate elements in the ptransform, or all stored in the same element.

  • ptransform (PTransform | PCollection | _LazyPCollection) – Beam collection of (xbeam.Key, xarray.Dataset) tuples with this dataset’s data.

Methods

__init__(template, chunks, split_vars, ...)

Low level interface for creating a new Dataset, without validation.

collect_with_direct_runner()

Collect a dataset in memory by writing it to a temp file.

consolidate_variables(*[, label])

Consolidate variables in this Dataset into a single chunk.

from_ptransform(ptransform, *, template, chunks)

Create an xarray_beam.Dataset from a Beam PTransform.

from_xarray(source, chunks, *[, split_vars, ...])

Create an xarray_beam.Dataset from an xarray.Dataset.

from_zarr(path, *[, chunks, split_vars, ...])

Create an xarray_beam.Dataset from a Zarr store.

head(*[, label])

Return a Dataset with the first N elements of each dimension.

map_blocks(func, *[, template, chunks, label])

Map a function over the chunks of this dataset.

mean([dim, skipna, dtype, label])

Compute the mean of this Dataset using Beam combiners.

pipe(func, *args, **kwargs)

Apply a function to this dataset with method-chaining syntax.

rechunk(chunks[, split_vars, min_mem, ...])

Rechunk this Dataset.

split_variables(*[, label])

Split variables in this Dataset into separate chunks.

tail(*[, label])

Return a Dataset with the last N elements of each dimension.

to_zarr(path, *[, zarr_chunks_per_shard, ...])

Write this dataset to a Zarr file.

transpose(*args[, label])

Attributes

bytes_per_chunk

Estimate of the number of bytes per dataset chunk.

chunk_count

Count the number of chunks in this dataset.

chunks

Dictionary mapping from dimension names to chunk sizes.

itemsize

Total size of dtype itemsizes in an PTransform element, in bytes.

ptransform

Beam PTransform of (xbeam.Key, xarray.Dataset) with this dataset's data.

sizes

Size of each dimension on this dataset.

split_vars

Whether variables are split between separate elements in the ptransform.

template

Template describing the structure of this dataset.