4.4 Regional modeling
We modeled data from all locations combined using a linear mixed model, a random forest model, and a gradient boosted regression model (Figure 5). Because Colorado site categories (degraded, treatment, reference) are not along similar streams like the other site datasets, we excluded Colorado from the model comparison. With the remaining data, we included 80% of the dataset in all three models and reserved 20% of the dataset for model evaluation. We compared the models with the root mean square error and coefficient of determination (R2) between the measured and predicted data. The random forest model had the lowest root mean square error (RMSE) of the 3 models at 1.26 % OC and the highest R2 of 0.68. All models tended to overestimate degraded and reference carbon content of the dataset. Model descriptions and results are shown in Table 5.