Appendix 3.

Brief Description of Machine Learning Algorithms Included in the SuperLearner Library

AlgorithmR PackageDescription
Regularization22glmnet
  • Penalized regression reduces overfit due to collinear independent variables

Elastic net
  • Ridge regression shrinks coefficients for collinear independent variables toward zero but does not fully eliminate any independent variable

  • Elastic net regression allows various penalties where coefficients for collinear independent variables are shrunk toward zero (but not to eliminating contributions to the predicted probability) and/or to zero (eliminating their contributions to the predicted probability)

  • Mixing parameter penalty (alpha) is set somewhere between 0.01 and .99

  • Lasso regression shrinks coefficients for collinear covariate coefficients to zero, eliminating their contributions to the predicted probability

Spline
Adaptive splines23earth
  • Adaptive spline regression flexibly captures interactions and linear and nonlinear associations

Adaptive polynomial splines24
  • Linear segments (splines) of varying slopes are connected and smoothed to create piecewise curves (basis functions)

polspline
  • Final fit is built using a stepwise procedure that selects the optimal combination of basis functions

  • Earth and polymars are generally similar but differ in the order that basis functions (eg, linear vs nonlinear) are added to build the final model

Decision trees
Random forest25ranger
  • Decision tree methods capture interactions and nonlinear associations

  • Independent variables are partitioned (based on values) and stacked to build decision trees and assemble an aggregate “forest”

  • Random forest builds numerous trees in bootstrapped samples and generates an aggregate tree by averaging across trees (reducing overfit)

  • Suitable for large data sets but may be unstable and overfitting

Gradient boosting26,27xgboost
  • Extreme gradient boosting decision tree algorithm.

  • Final predictions are formulated by models sequentially built (using gradient descent algorithm to minimize loss) to resolve residual error made by existing models

Neural networks28nnet
  • Connections between predictors and the outcome are modeled as a network

  • Predictors affect the outcome through intermediate layers

  • Weights are assigned to connections

  • Capture interactions and nonlinear associations

  • Low interpretability