4.4 Regional modeling
We modeled data from all locations combined using a linear mixed model,
a random forest model, and a gradient boosted regression model (Figure
5). Because Colorado site categories (degraded, treatment, reference)
are not along similar streams like the other site datasets, we excluded
Colorado from the model comparison. With the remaining data, we included
80% of the dataset in all three models and reserved 20% of the dataset
for model evaluation. We compared the models with the root mean square
error and coefficient of determination (R2) between
the measured and predicted data. The random forest model had the lowest
root mean square error (RMSE) of the 3 models at 1.26 % OC and the
highest R2 of 0.68. All models tended to overestimate
degraded and reference carbon content of the dataset. Model descriptions
and results are shown in Table 5.