Appendix D — The Sugartastic Regression Mind Map
Fun fact!
Sugartastic! So sweet, it could power a carnival’s worth of cotton candy machines.
The regression mind map is a key component of the philosophy behind this book, besides the data science workflow from Section 1.2 and the ML-Stats dictionary found in Appendix A. Figure D.1 shows this regression mind map split in two zones by colour: discrete and continuous. This regression mind map is handy when executing the data modelling stage from the data science workflow, as explained in Section 1.2.4. Recall the first step in this stage is to choose a suitable regression model, and we made this decision in the function of the type of outcome \(Y\) we are dealing with. That said, the distributional mind map from Figure C.1 complements the regression mind map when identifying the correct type of outcome \(Y\).
Suppose we start reading this regression map clockwise in the continuous zone. In that case, note this zone starts the cloud corresponding to Chapter 3 on the classical regression model called Ordinary Least-squares (OLS). Moreover, we can see that OLS is meant for an unbounded outcome \(Y\) (i.e., \(-\infty < Y < \infty\)). Then, we can proceed to the distributional assumption on \(Y\) in OLS, which would be Normal. Following up with the cloud corresponding to Chapter 4 on Gamma regression, we can see this model is meant for a nonnegative outcome \(Y\) (i.e., \(0 \leq Y < \infty\)) where we assume a Gamma distribution for \(Y\). This way of reading the continuous zone in the mind map will persist until the survival analysis models: Chapter 6 and Chapter 7.
Then, we can proceed to the discrete zone with the cloud corresponding to Chapter 8 on the generalized linear model (GLM) called Binary Logistic regression which aims to model a binary outcome \(Y\) (i.e., it can only take on the values \(0\) and \(1\)). Note that the Binary Logistic regression model is meant for ungrouped data where each row in the training dataset contains unique feature values. Hence, in this modelling case, we assume the outcome \(Y\) as a Bernoulli trial where \(1\) indicates a success and \(0\) indicates a failure. Then, suppose we take another clockwise case such as Chapter 10 on the GLM Classical Poisson regression, this model is suitable for count-type outcomes where equidispersion is present (i.e., the mean of the counts is equal to its corresponding variance). Finally, this model assumes that the outcome is Poisson-distributed. This way of reading the discrete zone in the mind map will persist until Chapter 15 on Ordinal Logistic Regression.