Statistics verbs
In plain English:
Statistics verbs compute summaries like mean, variance, std, and anomalies. With VirtualCube they now track running totals across tiles, so you get the same answer whether data streams or sits fully in memory.
What this page helps you do:
- Apply statistical verbs to streaming cubes
- Understand how incremental calculations work
- Debug slow or surprising outputs on very large requests
Streaming statistics example
from cubedynamics import pipe, verbs as v
variance_ts = (
pipe(cube)
| v.month_filter([6, 7, 8])
| v.variance(dim=("y", "x"))
)
For large cubes, each tile updates running variance trackers; the final result matches an in-memory calculation.
Behind the scenes
- Mean/variance/std: Online algorithms maintain sums and counts per tile.
- Anomalies: The baseline is computed incrementally, then applied to each tile.
- Correlations: Tile-level products are accumulated and normalized at the end.
Debugging statistical verbs
- Print the cube to confirm
VirtualCubeis active when expected. - Call
.debug_tiles()to see tile boundaries if a result seems slow. - Use
.materialize()before running a verb only when the dataset is small enough to fit in memory.
Legacy Technical Reference (kept for context)
Operations – statistic verbs
In plain English:
Statistic verbs summarize or standardize a cube along one dimension.
They build on transforms to answer questions like “how variable is summer rainfall?”
You will learn:
- How to compute means, variances, and z-scores
- How to control dimensions you keep or drop
- Where the deeper technical notes live
What this is
These functions live in cubedynamics.ops.stats and are also available under cubedynamics.verbs.
They expect an xarray cube and return an object with the same labeled coordinates unless you choose to drop dimensions.
Why it matters
Summaries like variance or z-score highlight unusual events and trends. Keeping the cube structure intact makes it easy to compare climate and vegetation or to hand off results to visualization tools.
How to use it
mean(dim="time", keep_dim=True)
Computes the average along a dimension.
from cubedynamics import pipe, verbs as v
avg = pipe(cube) | v.mean(dim="time", keep_dim=True)
Setting keep_dim=True leaves the dimension in place with length 1, which helps when you want to broadcast results later.
variance(dim="time", keep_dim=True)
Measures spread along a dimension.
var = pipe(cube) | v.variance(dim="time", keep_dim=True)
Use this to see how much a season or band varies over time.
zscore(dim="time", std_eps=1e-4)
Standardizes values by subtracting the mean and dividing by the standard deviation.
z = pipe(cube) | v.zscore(dim="time")
This returns unitless scores that show how unusual each timestep is relative to its own history.
Original Reference (kept for context)
Operations Reference – Stats
Statistic verbs summarize cubes along dimensions or compare axes. They live in cubedynamics.ops.stats and are re-exported via cubedynamics.verbs. Examples assume from cubedynamics import pipe, verbs as v and a cube variable bound to an xarray object.
mean(dim="time", keep_dim=True)
Compute the mean along a dimension.
result = pipe(cube) | v.mean(dim="time", keep_dim=True)
- Parameters
dim: dimension to summarize.keep_dim: retain the dimension as length 1 (default) or drop it entirely.
variance(dim="time", keep_dim=True)
Compute the variance along a dimension.
result = pipe(cube) | v.variance(dim="time", keep_dim=True)
- Parameters
dim: dimension to collapse.keep_dim: retain the dimension as length 1 (default) or drop it entirely.- Returns: variance cube matching the input layout when
keep_dim=True.
zscore(dim="time", std_eps=1e-4)
Standardize each pixel/voxel along a dimension by subtracting the mean and dividing by the standard deviation.
result = pipe(cube) | v.zscore(dim="time")
- Parameters
dim: dimension to standardize along.std_eps: mask threshold to avoid dividing by values with near-zero spread.- Returns: anomaly cube whose values are unitless z-scores per pixel. The verb always preserves the original cube shape.
correlation_cube (planned)
The exported factory currently raises NotImplementedError and is reserved for a future streaming implementation.
- Intended behavior: compute rolling or full-period correlations between named data variables or coordinates, returning an
xarraycube with correlation coefficients. - Alternatives today: use
xr.corrfor per-pixel correlations or the rolling helpers incubedynamics.stats.correlation/stats.tails.
Combine these stats with transforms and IO verbs to produce complete analyses.