The Regression Cookbook

Now with Machine Learning and Stats Flavours!

Authors

G. Alexi Rodríguez-Arelis

Andy Tai

Ben Chen

Published

November 18, 2024

Abstract
This book aims to set a common ground between machine learning and statistics regarding linear regression techniques, using Python and R, under two perspectives: inference and prediction.

Preface

Let the regression cooking begin!

Data science is a field in which we become aware of the fascinating overlap between machine learning and statistics. Many data science students usually come across everyday machine learning and statistics concepts or ideas that might only differ in names. For instance, simple terms such as weights in supervised learning (and their statistical counterpart as regression coefficients) might be misleading for students starting their data science formation. On the other hand, from an instructor’s perspective in a data science program that subsets its courses in machine learning in Python and statistics in R, regression courses in R also demand the inclusion of Python-related packages as alternative tools. Furthermore, in a graduate program such as the Master of Data Science (MDS) at the University of British Columbia, this is especially critical for students whose career plan leans towards the industry job market where Python is more heavily used.

That said, we can state that data science is a substantial synergy between machine learning and statistics. Nevertheless, many gaps between both disciplines still need to be addressed. Thus, closing these critical gaps is imperative in a domain with accelerated growth, such as data science. In this regard, the MDS Stat-ML dictionary has inspired us to write this textbook. It basically consists of common ground between foundational supervised learning models from machine learning and regression models commonly used in statistics. We strive to explore linear modelling approaches as a primary step while highlighting different terminology found in both fields. Furthermore, this discussion is more comprehensive than a simple conceptual exploration. Hence, the second step is hands-on practice via the corresponding Python packages for machine learning and R for statistics.

Fun fact!

While thinking about possible names for this work, I was planning to name it “Machine Learning and Statistics: A Common Ground.” Nevertheless, it was quite plain and boring! That said, this whole textbook idea sounded analogous to a cookbook1, given its heavily applied focus with theoretical sparks.

Hence, the cookbook name idea!

Les Chefs de Cuisine de la Régression

G. Alexi Rodríguez-Arelis

I'm an Assistant Professor of Teaching in the Department of Statistics and Master of Data Science at the University of British Columbia. Throughout my academic and professional journey, I've been involved in diverse fields, such as credit risk management, statistical consulting, and data science teaching. My doctoral research in statistics is primarily focused on computer experiments that emulate scientific and engineering systems via Gaussian stochastic processes (i.e., kriging regression). I'm incredibly passionate about teaching regression topics while combining statistical and machine learning contexts.

Andy Tai

I'm a Postdoctoral Teaching and Learning Fellow in the Department of Statistics and Master of Data Science at the University of British Columbia. Throughout my academic and professional journey, I've been involved in diverse fields, such as addiction psychiatry, machine learning, and data science teaching. My doctoral research in neuroscience primarily focused on using machine learning to predict the risk of fatal overdose. I am interested in leveraging data science and machine learning to solve complex problems, and I strive to inspire others to explore the vast potential of these fields.

Ben Chen

I hold a Master's degree in Data Science from the University of British Columbia, and I am passionate about educating others in the fields of statistics and data science. With experience teaching students how to use statistical methods and data science tools, I also enjoy sharing my knowledge through writing. My blog focuses on making complex statistical concepts accessible to everyone. Additionally, I've worked on a variety of data science projects, ranging from developing recommendation systems to building Generative Adversarial Network (GAN) models.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License


  1. Special thanks to Jonathan Graves, who mentioned the cookbook term when this textbook was conceptualized during very early stages.↩︎