Cloud Triangle: GitHub, Compute, and Persistent Storage

Working groups usually move between three connected places:

GitHub, where the group keeps code, documentation, meeting notes, small configuration files, small example data, and public-facing narrative.
Persistent storage, where the group keeps large datasets, model outputs, intermediate files, rasters, tables, and shared data products.
JupyterHub or another compute instance, where the group does active analysis, exploration, temporary processing, and interactive work.

A simple rule: use compute for active work, GitHub for small versioned materials, and persistent storage for large files that need to last.

What Goes Where

Item	Best home	Why
Meeting notes	GitHub	Small, readable, and versioned
Scripts and notebooks	GitHub	Reproducible and collaborative
Small configuration files	GitHub	Easy to review and track over time
Small example data	GitHub	Useful for tests, demos, and documentation
Raw rasters or large data	Persistent storage	Too large for GitHub
Intermediate model outputs	Persistent storage	Large and often regenerated
Shared tables or data products	Persistent storage	Need to be findable by the full group
Final figures	GitHub if small, persistent storage if large	Should be easy to cite and reuse
Temporary scratch files	Compute instance	Not meant for long-term storage
Final reports and public docs	GitHub / website	Public-facing and versioned

Use GitHub For Shared Memory

Small files belong in GitHub because they are easy to version and review. GitHub is a good home for:

Markdown pages in docs/
meeting notes and decisions
scripts, notebooks, and workflow files
small figures used directly on the website
metadata notes that explain where larger files live
references, citation notes, and reuse instructions

If a future participant needs to understand why the group made a decision, GitHub should give them the trail.

Use Persistent Storage For Large Files

Large files belong in persistent storage because GitHub is not designed for raw environmental data, rasters, model outputs, or large intermediate results.

Use persistent storage for:

raw data
processed data too large for GitHub
model runs and intermediate outputs
large figures, maps, or tables
shared group data products
files that need to survive after a compute instance stops

Every large data folder in persistent storage should have a small README or metadata note referenced from GitHub. That note should say what the folder contains, who created it, when it was created, how it was produced, and whether it is preliminary or ready to reuse.

Use Compute For Active Work

JupyterHub, CyVerse VICE, or another compute instance is the place to work interactively. It is where participants open notebooks, test workflows, make plots, and inspect data.

Treat the compute instance as a workspace, not an archive. Before the end of a work session:

Push code, notebooks, notes, and small figures to GitHub.
Move large data and outputs to persistent storage.
Add or update a small GitHub note that explains where large files are stored.

A Working Session Checklist

Start each work session by pulling the latest changes from GitHub. Then confirm where active files belong before creating new outputs.

Code or documentation change? Commit it to GitHub.
Large data or output? Save it to persistent storage.
Temporary scratch file? Keep it on the compute instance only while you need it.
Shared result? Document it in GitHub and point to the storage location if the file is large.

Related pages: