Data & Storage
Use this page to keep track of every dataset used in the Historic Biodiversity & Human Infrastructure sprint. Add links as soon as you adopt a source so teammates know where to find it and what preparation steps are required.
Quick links
- Group 5 CyVerse folder: i:/iplant/home/shared/esiil/Innovation_summit/Group_5
- Shared code repository: code/ directory
- Working notes: documentation/group-notes.md
Datasets in play
Dataset | Description | Access / Path | Notes |
---|---|---|---|
GBIF historic occurrences | Filtered occurrences (1900–present) for focal taxa within the study area. | Group_5/shared_data/gbif_occurrences/ |
Export as CSV + GeoJSON; include citation metadata. |
Land cover & habitat connectivity | NLCD, USGS PAD-US cores, and connectivity rasters for fragmentation analysis. | Group_5/shared_data/habitat_connectivity/ |
Large rasters — sync via gocmd with --diff . |
Transportation & energy corridors | NTAD transportation layers plus EIA transmission corridors. | Group_5/shared_data/infrastructure_corridors/ |
Document version/date in README. |
Community & stewardship layers | Tribal lands, conservation easements, and community-identified priority sites. | Group_5/shared_data/community_layers/ |
Confirm sharing permissions before publishing maps. |
Add rows as you incorporate new data. If a dataset lives outside CyVerse, include the public URL and note authentication requirements.
Handling sensitive or large data
- Keep raw downloads in CyVerse rather than GitHub. Use lightweight samples if you need to demonstrate structure in this repo.
- Record any restrictions (e.g., license, data sharing agreements) directly in the table above.
- When generating outputs, save deliverables to
Group_5/outputs/
with timestamps so others can trace your workflow.
Reproducibility checklist
- [ ] Each dataset listed above includes a pointer to the exact file or folder location.
- [ ] Processing scripts in
code/
mention required inputs/outputs in their docstrings or README entries. - [ ] Visuals on the homepage cite the data sources that produced them.
Keeping this page current helps external reviewers and future teammates understand how to rebuild the analysis.