Project methods overview
Data Sources
Unpublished aggregated data from Linked Disturbance Group from Forest Resilience Working Group. Used with permission of group. Data gathered from all public sources: GEDI, MODIS, and more.
Data Processing Steps
Code for cleaning original data frame is in data_wrangling.r. We filtered out above-ground biomass density values >500 Mg/ha to reduce outliers due to low quality GEDI retrievals. After filtering, the dataset contained 3,393,031 GEDI observations.
This script saves a cleaned csv in ~/data/. The rest of the analyses load from this cleaned csv.
Data Analysis
Baseline model: Inverse Distance Weighting
To have a baseline to compare against, we performed spatial interpolation to approximately 1km x 1km pixels using Inverse Distance Weighting on the GEDI above-ground biomass density point estimates.
Random Forest Modeling
Random forest modeling was performed on CyVerse using the R spatialRF package. The script for modeling is code/analysis/spatial_rf_model.R. We trained a model with 500 trees, a minimum node size of 25, and mtry (the number of variables to possibly split with in each node) set to 3. A total of 13 variables were used as predictors in the model: 'aetNorm', 'defNorm', 'forestCode', 'modisTreeVeg', 'peakNDVI', 'yrsSinceFire','yrsSinceInsect','yrsSinceHotDrought', 'utm_z13n_easting', 'utm_z13n_northing', 'yrsSinceHotDrought..x..yrsSinceInsect','yrsSinceFire..x..yrsSinceHotDrought', 'yrsSinceFire..x..yrsSinceInsect'
Model Evaluation
Both models (the baseine inverse-distance weighting model and the random forest model) were trained on a training set consisting of 70% of the GEDI data rows. 15% of the data were used as a validation set and 15% were withheld as a final test set. After training, we computed RMSE and R^2 on the validation set to compare. The model evaluation script is code/analysis/calculate_accuracy_stats.R. It uses csvs generated by the spatial_rf_model.R and spatial_interpolation.R scripts.
Visualizations
Visualizations of above ground biomass density versus fire, drought, and insect disturbances were created using Tidyverse ggplot methodology. We also explored the impact of these disturbance types on NDVI and forest type distribution in corelation with trajectories of disturbance recovery visuals. Above ground biomass density was limited to 500 Mg due to time constraints.
Conclusions
See our presentation for full dicussion and conclusion.
References
- Hammond, W.M., Williams, A.P., Abatzoglou, J.T. et al. Global field observations of tree die-off reveal hotter-drought fingerprint for Earth’s forests. Nat Commun 13, 1761 (2022). https://doi.org/10.1038/s41467-022-29289-2
- Wright MN, Ziegler A (2017). “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, 77(1), 1–17. doi:10.18637/jss.v077.i01.
- Benito BM (2021). spatialRF: Easy Spatial Regression with Random Forest. doi:10.5281/zenodo.4745208, R package version 1.1.3, https://blasbenito.github.io/spatialRF/.