Skip to content

Getting Started

This guide gets you to a first working cube and pipeline as quickly as possible, while showing how the same code scales up when datasets are large.

Installation

Install from PyPI:

pip install cubedynamics

Or install the latest main branch:

pip install "git+https://github.com/CU-ESIIL/cubedynamics.git@main"

CubeDynamics runs anywhere xarray runs: laptops, HPC clusters, or hosted notebooks.

Your First Cube

Most users start by loading a cube from an existing dataset. For example, this creates a precipitation cube from PRISM:

import cubedynamics as cd

cube = cd.load_prism_cube(
    lat=40.0,
    lon=-105.25,
    start="2000-01-01",
    end="2020-12-31",
    variable="ppt",
)

Inspect the structure to confirm dimensions and metadata:

cube.dims
cube.shape
cube.attrs

Your First Pipeline

Pipelines are built with the pipe (|) operator and a grammar of verbs:

from cubedynamics import pipe, verbs as v

result = pipe(cube) | v.mean(dim="time") | v.plot()

This example computes the mean through time for every location in the cube and plots the result. You can swap in different verbs (e.g., v.anomaly(), v.variance(), v.month_filter([6, 7, 8])) without changing the pipeline structure.

Loading Real Datasets

The same pattern works for other datasets shipped with CubeDynamics. The load_prism_cube call above can be expanded to longer time ranges or larger spatial domains, and other loaders follow the same conventions for inputs like latitude, longitude, date ranges, and variable names.

Scaling Up Without Changing Code

Large requests automatically stream as VirtualCubes, so you do not have to rewrite your pipeline when datasets exceed memory. VirtualCubes:

  • represent a cube without materializing it upfront
  • stream tiles of data through the same verbs
  • keep your code and semantics identical at small or large scales

Working With Large Datasets

If you request a larger area of interest or longer date range, the loader silently returns a VirtualCube that streams tiles through the same verbs. You can inspect and control streaming when needed:

ndvi = cd.ndvi(
    lat=40.0,
    lon=-105.25,
    start="1970",
    end="2020",
    streaming_strategy="virtual",
    time_tile="5y",
)
print(ndvi)           # shows that it is a VirtualCube
ndvi.debug_tiles()    # prints time + space tiles
ndvi.materialize()    # forces full load; only for small areas

Try smaller time_tile values or reduced spatial bounds if you see slow progress or rate limits.

When a request is too large for a normal in-memory cube, CubeDynamics:

  • splits the timeline into tiles (for example, five-year windows)
  • splits the area of interest into spatial tiles when needed
  • streams each tile through the verbs, tracking running statistics like variance or mean
  • returns a normal-looking DataArray/Dataset at the end

Common Pitfalls

  • Make sure the requested variable name matches the dataset.
  • Verify dimensions before running large analyses so operations occur over the intended axis.
  • Use streaming defaults for big pulls instead of forcing full materialization.
  • For event windows, request daily frequency (freq="D")—monthly codes like "MS"/"ME" over short ranges can return an empty time axis.
  • Leave allow_synthetic=False unless you explicitly want demo data; provenance (source, is_synthetic, backend_error) on cubes will confirm what you received.
  • A "streaming backend unavailable" warning means CubeDynamics fell back to a download backend. Install optional dependencies or check network access before re-running if you need streaming.

Where to go next

CubeDynamics provides a unified, cube-native way to work with spatiotemporal environmental data—simple enough for quick exploration, powerful enough for large-scale scientific analysis.