# Project methods overview

## Data Sources

Unpublished aggregated data from Linked Disturbance Group from Forest Resilience Working Group. Used with permission of group. Data gathered from all public sources: GEDI, MODIS, and more.

## Data Processing Steps

Code for cleaning original data frame is in data_wrangling.r. We filtered out above-ground biomass density values >500 Mg/ha to reduce outliers due to low quality GEDI retrievals. After filtering, the dataset contained 3,393,031 GEDI observations.

This script saves a cleaned csv in ~/data/. The rest of the analyses load from this cleaned csv.

## Data Analysis

### Baseline model: Inverse Distance Weighting

To have a baseline to compare against, we performed spatial interpolation to approximately 1km x 1km pixels using Inverse Distance Weighting on the GEDI above-ground biomass density point estimates.

### Random Forest Modeling

Random forest modeling was performed on CyVerse using the R spatialRF package. The script for modeling is code/analysis/spatial_rf_model.R. We trained a model with 500 trees, a minimum node size of 25, and mtry (the number of variables to possibly split with in each node) set to 3. A total of 13 variables were used as predictors in the model: 'aetNorm', 'defNorm', 'forestCode', 'modisTreeVeg', 'peakNDVI', 'yrsSinceFire','yrsSinceInsect','yrsSinceHotDrought', 'utm_z13n_easting', 'utm_z13n_northing', 'yrsSinceHotDrought..x..yrsSinceInsect','yrsSinceFire..x..yrsSinceHotDrought', 'yrsSinceFire..x..yrsSinceInsect'

### Model Evaluation

Both models (the baseine inverse-distance weighting model and the random forest model) were trained on a training set consisting of 70% of the GEDI data rows. 15% of the data were used as a validation set and 15% were withheld as a final test set. After training, we computed RMSE and R^2 on the validation set to compare. The model evaluation script is code/analysis/calculate_accuracy_stats.R. It uses csvs generated by the spatial_rf_model.R and spatial_interpolation.R scripts.

## Visualizations

Visualizations of above ground biomass density versus fire, drought, and insect disturbances were created using Tidyverse ggplot methodology. We also explored the impact of these disturbance types on NDVI and forest type distribution in corelation with trajectories of disturbance recovery visuals. Above ground biomass density was limited to 500 Mg due to time constraints.

## Conclusions

See our presentation for full dicussion and conclusion.

## References

- Hammond, W.M., Williams, A.P., Abatzoglou, J.T. et al. Global field observations of tree die-off reveal hotter-drought fingerprint for Earth’s forests. Nat Commun 13, 1761 (2022). https://doi.org/10.1038/s41467-022-29289-2
- Wright MN, Ziegler A (2017). “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software, 77(1), 1–17. doi:10.18637/jss.v077.i01.
- Benito BM (2021). spatialRF: Easy Spatial Regression with Random Forest. doi:10.5281/zenodo.4745208, R package version 1.1.3, https://blasbenito.github.io/spatialRF/.