Code — Interoperability prototypes
Document the key scripts, notebooks, and services that Group 18 is building during the summit. Link directly to the files in the code/
directory (or shared cloud environments) and explain how to run them so others can replicate your steps.
Repository layout
code/ingest/
— scripts that download or synchronize external datasets.code/harmonize/
— notebooks and utilities that translate schemas, vocabularies, and metadata.code/visualize/
— quick dashboards, plots, or reporting helpers for communicating results.code/utils/
— shared helpers (configuration loaders, logging, validation) that other workflows rely on.
Feel free to adjust these folders to match how your team is organizing work.
Workflows in progress
Schema harmonizer notebook
- Path:
code/harmonize/schema_harmonizer.ipynb
- What it does: Aligns NOAA CDO climate fields with USGS Water Services metadata using a reusable translation table.
- Inputs:
data/raw/noaa_cdo/*.csv
,data/raw/usgs/*.json
, and the metadata mapping indocumentation/metadata-crosswalk.md
. - How to run: Launch in JupyterLab (CyVerse) and execute cells top-to-bottom; update the configuration block with date ranges and station IDs.
- Outputs: Harmonized dataset saved to
data/processed/interoperable_timeseries.parquet
plus summary plots saved indocs/assets/results/
.
Vocabulary translation CLI
- Path:
code/utils/vocab_translate.py
- What it does: Applies lookup tables to remap QA/QC flags and categorical descriptors.
- How to run:
bash python code/utils/vocab_translate.py \ --config config/vocab_translation.yaml \ --input data/raw/usgs/latest.json \ --output data/processed/usgs_translated.json
- Notes: Commit updates to
config/vocab_translation.yaml
so others inherit the latest mappings.
Visualization dashboard
- Path:
code/visualize/harmonized_summary.qmd
- What it does: Generates a Quarto dashboard comparing raw vs. harmonized feeds and highlighting anomalies.
- Publish: Render locally with
quarto render code/visualize/harmonized_summary.qmd --to html
and commit the HTML todocs/assets/results/
or link to a deployed version.
Contribution checklist
- [ ] Add a docstring or short README to every script/notebook describing purpose and inputs.
- [ ] Store credentials outside the repo (use environment variables or CyVerse secrets).
- [ ] Push intermediate results to the community folder rather than committing large binaries.
- [ ] Reference outputs on the Home page so partners can quickly find deliverables.
Log significant updates in docs/updates.md
to keep the sprint timeline transparent.