Appendix.

Selected Machine Learning Methods for Classification of Unknown Cases into Mutually Exclusive Categories

Method	Advantage	Disadvantage
Random forest	Low computational cost Uses missing data to inform model Can handle large number of records and variables Provides estimates of the information gained by each input variable Works well with nonlinear data	Not ideal for rare outcomes Very difficult to interpret individual variable contributions to classification Time consuming hyperparameter tuning Overfitting of data
Support Vector Machines	Low computationally cost Effective when number of variables> number of records (very wide data)	Need a clear margin of separation between outcomes (unhealthy drinking vs low-risk) Time consuming hyperparameter tuning Not efficient with large number of records
Neural Networks	Works well with nonlinear data	High computational cost during training
	Extremely useful with large number of predictors (high dimensionality (e.g. image data))	Time consuming hyperparameter tuning
	Any numeric data can be used	Need relatively large number of records for training set
		Very difficult to interpret individual variable contributions to classification
		Must have many records per variable Overfitting of data
K-nearest neighbors	Very simple construction requiring minimal specifications (a.k.a. hyperparameters) Intuitive methodology	High computational cost Challenging with large number of variables (wide data) Cannot handle imbalanced data Very sensitive to outliers Cannot handle missing data
Decision Trees	Can handle missing data	Highly biased to training set
Decision Trees	No data preprocessing needed Provides highly intuitive explanation over the prediction	Relatively inaccurate compared to other models
Logistic Regression	Common and understood by most	Proper selection of features is required
	Relatively easy to implement	Cannot handle missing data
	Loss function is always convex	Needs data preprocessing and handling to cover non-linear data Cannot handle large number of categorical predictors