Model specifics

This page summarizes the final county-level models used for the interactive map: gradient-boosted regressors (XGBoost) on the Full_pesticides_raw feature set (all pesticide_*_kg fields plus baseline covariates). Outcomes are CDC PLACES CASTHMA and COPD prevalence (%). Full documentation lives in the repository docs/.

Specifications

AlgorithmXGBoost regressor (tuned via grouped CV)
Feature setFull_pesticides_raw — 445 pesticide_*_kg columns + 16 baseline covariates (demographics, PLACES confounders, cropland, YEAR)
TargetsAsthma (CASTHMA), COPD (COPD) — county prevalence %
SplitSpatial / grouped scheme with train, validation (holdout), and test; map uses predictions for all counties with features (2019 layer)
Hyperparameters (holdout tuning)CASTHMA: learning_rate=0.1, max_depth=5, n_estimators=200 · COPD: learning_rate=0.05, max_depth=5, n_estimators=200
Code & notebooksmodeling/model_selection.ipynb, modeling/validate_model_accuracy.py

Developers: Allison Londeree, CJ Concepcion, Matthew Hamil, Ryan Bausback, Sunyoung Park. Model card (Mitchell et al.) in repo.

Documentation (repository)

Performance metrics

Regression metrics on county-level prevalence. OOF = out-of-fold on the training split; holdout = external validation split (n = 1219 county-years each target).

Cross-validation (OOF)

TargetRMSEMAE
CASTHMA0.3450.2610.879
COPD0.7240.5480.902

External holdout validation

TargetRMSEMAERMSE 95% CIn
CASTHMA0.3890.2870.835[0.366, 0.410]1219
COPD0.7650.5740.885[0.728, 0.803]1219

Fit: predicted vs. actual

Four panels: CASTHMA and COPD, each with cross-validation (OOF) and final validation (holdout). Points near the 45° line indicate close agreement between model predictions and CDC PLACES prevalence. Regenerate with python modeling/plot_final_full_pesticides_xgboost_results.py.

2x2 scatter plots: actual vs predicted prevalence for CASTHMA and COPD under cross-validation and holdout validation
Final XGBoost (Full_pesticides_raw): predicted vs. actual prevalence (%).

Equity view

Exploratory view of prediction error by subgroup (e.g. income, urban–rural, race proxy). This supports discussion in the model card; it is not a causal fairness audit. Regenerate with python modeling/plot_equity_gap_values_final_full_pesticides_xgboost.py and related notebooks as documented in modeling/.

Equity gap (Final validation holdout): MAE gaps by key groups Gap = group MAE - overall MAE (within each condition). Poverty proxy uses median income tertiles. CASTHMA (overall MAE=0.274) Positive gap = worse error for that group Low income gap=+0.060 (n=204) Mid income gap=-0.038 (n=203) High income gap=-0.022 (n=203) Urban (metro) gap=+0.002 (n=230) Rural (non-metro) gap=-0.001 (n=380) Majority-BIPOC gap=+0.236 (n=66) Majority-white gap=-0.029 (n=544) Proxies: income tertiles; metro vs non-metro; Majority-BIPOC (pct_white<50%). COPD (overall MAE=0.430) Positive gap = worse error for that group Low income gap=+0.088 (n=204) Mid income gap=-0.047 (n=203) High income gap=-0.041 (n=203) Urban (metro) gap=-0.001 (n=230) Rural (non-metro) gap=+0.001 (n=380) Majority-BIPOC gap=+0.203 (n=66) Majority-white gap=-0.025 (n=544) Proxies: income tertiles; metro vs non-metro; Majority-BIPOC (pct_white<50%).
MAE gap (subgroup vs. overall) — final validation holdout, Full pesticides XGBoost.

Disclaimer

Estimates are ecological and associative. They are for planning and research, not individual diagnosis or legal use. See the model card for full limitations and ethical considerations.