Audience and Scope
This book mainly focuses on regression analysis and its supervised learning counterpart. Thus, it is not introductory statistics and machine learning material. Also, some coding background on R
(R Core Team 2024) and/or Python
(Van Rossum and Drake 2009) is recommended. That said, the following topics are suggested as fundamental reviews:
- Mutivariable differential calculus and linear algebra. Certain sections of each chapter pertain to modelling estimation. Therefore, topics such as partial derivatives and matrix algebra are a great asset. You can find helpful learning resources on the MDS webpage.
- Basic
Python
programming. When necessary,Python
{pandas} (The Pandas Development Team 2024) library will be used to perform data wrangling. The MDS course DSCI 511 (Programming for Data Science) is an ideal example of a quick review.
- Basic
R
programming. Knowledge of data wrangling and plotting throughR
{tidyverse} (Wickham et al. 2019) is recommended for hands-on practice via the cases provided in each one of the chapters of this book. The MDS courses DSCI 523 (Programming for Data Manipulation) and DSCI 531 (Data Visualization I) are ideal examples of a quick review. - Foundations of probability and basic distributional knowledge. The reader should be familiar with elemental discrete and continuous distributions since they are a vital component of any given regression or supervised learning model. The MDS course DSCI 551 (Descriptive Statistics and Probability for Data Science) is an ideal example of a quick review.
- Foundations of frequentist statistical inference. One of the data science paradigms to be covered in this book is statistical inference, i.e., identifying relationships between different variables in a given population or system of interest via a sampled dataset. I only aim to cover a frequentist approach using inferential tools such as parameter estimation, hypothesis testing, and confidence intervals. The MDS course DSCI 552 (Statistical Inference and Computation I) is an ideal example of a quick review.
- Foundations of supervised learning. The second data science paradigm to be covered pertains to prediction, which is core in machine learning. The reader should be familiar with basic terminology, such as training and testing data, overfitting, underfitting, cross-validation, etc. The MDS course DSCI 571 (Machine Learning I) provides these foundations.
- Foundations of feature and model selection. This prerequisite also relates to machine learning and its corresponding prediction paradigm. Basic knowledge of prediction accuracy and variable selection tools is recommended. The MDS course DSCI 573 (Feature and Model Selection) is an ideal example of a quick review.
A further remark on probability and statistical inference
In case the reader is not 100% familiar with probabilistic and inferential topics, as discussed above, we will provide a fundamental refresher in 2 Basic Cuisine: A Review on Probability and Frequentist Statistical Inference with crucial points that are needed to follow along the statistical way each one of the chapters is delivered (more specifically for modelling estimation/training matters!).
Furthermore, this refresher will be integrated into the three big pillars that will be fully expanded in this book, more concretely in 1 Getting Ready for Regression Cooking!: a data science workflow, the right workflow flavour (inferential or predictive), and a regression toolbox.