Multiclass Prediction and Inference

A Practical Approach

G. Alexi Rodríguez-Arelis

Department of Statistics

2024-06-05

Agenda

  1. The Master of Data Science
  2. Regression Challenges
  3. Three Pillars in Regression Teaching
  4. Data Science Workflow
  5. The Regression Mind Map
  6. Multiclass Showcase
  7. Future Work

QR code takes you to the presentation’s website: https://alexrod61.github.io/ssc-2024-multiclass-prediction/

1. The Master of Data Science

  • 10-month accelerated professional program
  • 24 one-credit courses distributed across six blocks
  • A subset of these courses falls on the statistical side:
  1. DSCI 551: Descriptive Statistics and Probability for Data Science
  2. DSCI 552: Statistical Inference and Computation I (frequentist)
  3. DSCI 553: Statistical Inference and Computation II (Bayesian)
  4. DSCI 554: Experimentation and Causal Inference
  5. DSCI 561: Regression I (ordinary least-squares, OLS)
  6. DSCI 562: Regression II (beyond OLS)

2. Regression Challenges

  • Let’s focus on DSCI 562
  • There are eight lectures, four labs, and two quizzes
  • How can we cover generalized linear models (GLMs), mixed-effects, local, survival, and quantile regression, and techniques for dealing with missing data across eight lectures in four weeks?

3. Three Pillars in Regression Teaching

  • DSCI 562 targets Regression Analysis for complex data (which can’t and shouldn’t be modelled via OLS)
  • Given the amount of material to cover, lectures are usually quite dense
  • Therefore, it’s imperative to have an efficient and homogeneous teaching approach that relies on three pillars

The pillars

  1. The use of a Data Science workflow
  2. Choosing a proper workflow flavour according to either an inferential or predictive paradigm
  3. The correct use of an appropriate regression model based on the response of interest (when to use what as a mind map)

4. Data Science Workflow

Hold on! There’s more

5. The Regression Mind Map

  • During lecture1, we start out this regression mind map with OLS

Now, with the first GLMs…

How about Multinomial Logistic regression?

Moving on to Ordinal Logistic regression

Here comes Survival Analysis!

A digression on local regression

Finalizing with Quantile regression

6. Multiclass Showcase

  • Let’s check the Jupyter book notes on Multinomial Logistic regression (QR code takes you to DSCI 562 public materials)
  • We use a Spotify-related dataset

Copyright © 2024 Spotify AB. Spotify is a registered trademark of the Spotify Group.

7. Future Work

  • A whole frequentist textbook on maximum likelihood-based approaches (both on Python and R), and open to collaborations!

Questions?

QR code takes you to the presentation’s website: https://alexrod61.github.io/ssc-2024-multiclass-prediction/