Audience and Scope
This book mainly focuses on statistical regression analysis with connections to its corresponding supervised learning counterpart. Thus, it is not introductory statistics and machine learning material. Also, some coding background on R
(R Core Team 2024) and/or Python
(Van Rossum and Drake 2009) is recommended. That said, the following topics are suggested:
- Mutivariable differential calculus. Certain sections of each chapter pertain to modelling estimation. Therefore, topics such as partial derivatives are a good asset. You can find helpful learning resources on the Master of Data Science (MDS) webpage.
- Basic
Python
programming. When necessary,Python
{pandas} (The Pandas Development Team 2024) library will be used to perform data wrangling. The MDS course DSCI 511 (Programming for Data Science) is an ideal example of a quick review.
- Basic
R
programming. Knowledge of data wrangling throughR
{tidyverse} (Wickham et al. 2019) is recommended for hands-on practice via the cases provided in each one of the chapters of this book. The MDS course DSCI 523 (Programming for Data Manipulation) is an ideal example of a quick review. - Foundations of supervised learning. A fundamental data science paradigm to be covered pertains to prediction, which is core in machine learning. The reader should be familiar with basic terminology, such as training and testing data, overfitting, underfitting, etc. The MDS course DSCI 571 (Machine Learning I) provides these foundations.
- Foundations of feature and model selection. This prerequisite also relates to machine learning and its corresponding prediction paradigm. Basic knowledge of prediction accuracy and model selection tools is recommended. The MDS course DSCI 573 (Feature and Model Selection) is an ideal example of a quick review.
A Crucial Remark on Probability and Statistical Inference
If you are not fully familiar with introductory statistical concepts, particularly topics related to probability and inference, we suggest two pathways for review. The first pathway involves revisiting the following course materials:
- Foundations of probability and basic distributional knowledge: The MDS course DSCI 551 (Descriptive Statistics and Probability for Data Science) covers fundamental discrete and continuous probability distributions, which are essential components of any regression or supervised learning model.
- Foundations of frequentist statistical inference: The MDS course DSCI 552 (Statistical Inference and Computation I) addresses statistical inference, a key paradigm in this book. This involves identifying relationships between different variables within a population or system of interest using a sampled dataset. We focus exclusively on a frequentist approach utilizing tools such as parameter estimation, hypothesis testing, and confidence intervals.
The second pathway entails an in-depth review of the refresher material provided in Chapter 2, which covers critical points needed to grasp the statistical concepts presented in each of the core thirteen regression chapters. This refresher chapter aims to address the same topics outlined in the above bullet points through a practical example, with the necessary theoretical background to understand the foundations of generative modeling and statistical inference.