Spatial & CRS Dataset Contract
Purpose and scope
This document defines the canonical spatial and coordinate reference system (CRS) contract for cube-based verbs. It is an internal, normative spec that every spatial verb (e.g., v.clip, v.mask, v.zonal_stats, v.fire_plot) must follow to ensure consistent behavior across datasets and to enable shared tests.
Cube invariants
- Time dimension required: A
timedimension must exist and be parseable to pandasDatetimeIndexvianormalize_dates. - Exactly two spatial dimensions: Cubes must expose exactly two spatial dimensions. Preferred names are
("y", "x"); the fallback is("lat", "lon"). - Ambiguity must fail: If spatial dimensions cannot be uniquely determined, verbs must raise a clear error instead of guessing.
CRS inference contract
Spatial verbs must use the shared infer_epsg helper and respect the following precedence:
1. da.attrs["epsg"]
2. Scalar coordinate named "epsg"
3. da.rio.crs.to_epsg() (if available)
4. da.attrs["crs"] parsed via pyproj
5. Required fallback heuristic:
- If dims are ("lat", "lon") ⇒ EPSG:4326.
- If dims are ("y", "x") and max(|x|) <= 180 and max(|y|) <= 90 ⇒ EPSG:4326.
- Else raise a clear ValueError (unknown CRS).
Geometry reprojection rule
Reproject vector inputs into the cube's CRS once. Never reproject the cube to the vector CRS within verbs; downstream operations assume cube grids stay untouched.
Spatial membership semantics
- Boundary-inclusive: Membership is defined by
covers(Point(x, y));contains()is disallowed because it excludes boundary pixels. - Correctness-first path: Use prepared geometry + per-pixel point tests as the reference implementation.
- Optional fast path:
rasterio.features.rasterize(..., all_touched=True)may be used only whenfast=True. Default must befast=False, and the docs for verbs must call out the approximation.
Time alignment contract (event-based)
For cube timestamp t, select the latest geometry date <= t after day-level normalization. This allows event polygons to be matched to the nearest prior observation without stepping into the future.
Dataset compatibility table
| Dataset | Expected grid | Metadata quirks | Contract behavior |
|---|---|---|---|
| gridMET | lat/lon in degrees |
CRS metadata often missing | Default to EPSG:4326 via fallback heuristic. |
| PRISM | lat/lon in degrees |
CRS metadata often missing | Same as gridMET: fallback to EPSG:4326. |
| Sentinel-2 NDVI | Projected meters (UTM) | EPSG commonly stored as scalar coord | Use scalar EPSG; reproject vectors to cube CRS. |
How to use this contract
Use the shared helpers from cubedynamics.fire_time_hull until a dedicated module is extracted. Example invocation inside a verb:
from cubedynamics.fire_time_hull import infer_spatial_dims, infer_epsg, normalize_dates
def clip(da, geometry, *, fast=False):
y_dim, x_dim = infer_spatial_dims(da)
epsg = infer_epsg(da)
dates = normalize_dates(da["time"].values)
# reproject geometry to epsg, enforce membership semantics, etc.
Testing with synthetic cubes (no network):
import numpy as np
import pandas as pd
import xarray as xr
def gridmet_like():
return xr.DataArray(
np.zeros((2, 2, 2)),
dims=("time", "lat", "lon"),
coords={"time": pd.date_range("2020-01-01", periods=2)},
name="tmean",
)
def prism_like():
return xr.DataArray(
np.ones((2, 2, 2)),
dims=("time", "lat", "lon"),
coords={"time": pd.date_range("2020-02-01", periods=2)},
name="ppt",
)
def sentinel2_like():
return xr.DataArray(
np.full((2, 2, 2), 5),
dims=("time", "y", "x"),
coords={
"time": pd.date_range("2020-03-01", periods=2),
"epsg": 32613,
},
name="ndvi",
)
Required test fixtures
- gridMET-like cube: Degrees, missing CRS metadata; exercises fallback to EPSG:4326.
- PRISM-like cube: Degrees, missing CRS metadata; same fallback as gridMET.
- Sentinel-2-like cube: Projected meters with scalar
epsg=326xx; ensures vectors are reprojected to cube CRS.
Non-goals
This contract does not define resampling behavior, cube reprojection strategies, fractional coverage calculations, or performance tuning beyond the optional rasterize fast path.
Reviewer checklist
- [ ] Time dimension present and normalized via
normalize_dates. - [ ] Spatial dimensions resolved by
infer_spatial_dims(error on ambiguity). - [ ] EPSG inferred via contract precedence with clear fallback/error.
- [ ] Vector geometries reprojected to cube CRS; cube grid never reprojected.
- [ ] Membership uses boundary-inclusive semantics; fast path gated behind
fast=True. - [ ] Tests cover required synthetic cube fixtures.