Appendix A — The Fusionified ML-Stats Dictionary

Fun fact!

Fusionified! A mix of flavors from various cuisines that somehow (or miraculously) works.

Machine learning and statistics comprise a substantial synergy that is reflected in data science. Thus, it is imperative to construct solid bridges between both disciplines to ensure everything is clear regarding their tremendous amount of jargon and terminology. This ML-Stats dictionary (ML stands for Machine Learning) aims to be one of these bridges in this textbook, especially within supervised learning and regression analysis contexts.

Image by Gerd Altmann via Pixabay.

Below, you will find definitions either highlighted in blue if they correspond to statistical terminology or magenta if the terminology is machine learning-related. These definitions come from all definition admonitions, such as in (Definition-sample?). This colour scheme strives to combine all terminology to switch from one field to another easily. With practice and time, we should be able to jump back and forth when using these concepts.

Attention!

Noteworthy terms (either statistical or machine learning-related) will include a particular admonition identifying which terms (again, either statistical or machine learning-related) are equivalent (or NOT equivalent if that is the case!).

A

Attribute

Equivalent to:

Covariate, exogeneous variable, explanatory variable, feature, independent variable, input, predictor or regressor.

Average

Equivalent to:

Expected value or mean.

B

Bayesian statistics

This statistical school of thinking also relies on the frequency of events to estimate specific parameters of interest in a population or system. Nevertheless, unlike frequentist statistics, Bayesian statisticians use prior knowledge on the population parameters to update their estimations on them along with the current evidence they can gather. This evidence is in the form of the repetition of \(n\) experiments involving a random phenomenon. All these ingredients allow Bayesian statisticians to make inference by conducting appropriate hypothesis testings, which are designed differently from their mainstream frequentist counterpart.

Under the umbrella of this approach, we assume that our governing parameters are random; i.e., they have their own sample space and probabilities associated to their corresponding outcomes. The statistical process of inference is heavily backed up by probability theory mostly in the form of the Bayes theorem (named after Reverend Thomas Bayes, an English statistician from the 18th century). This theorem uses our current evidence along with our prior beliefs to deliver a posterior distribution of our random parameter(s) of interest.

C

Continuous random variable

Covariate

Equivalent to:

Attribute, exogeneous variable, explanatory variable, feature, independent variable, input, predictor or regressor.

D

Dependent variable

In supervised learning, it is the main variable of interest we are trying to learn or predict, or equivalently, the variable we are trying explain in a statistical inference framework.

Equivalent to:

Endogeneous variable, response variable, outcome, output or target.

Discrete random variable

E

Equidispersion

Expected value

Equivalent to:

Average or mean.

Exogeneous variable

Equivalent to:

Attribute, covariate, explanatory variable, feature, independent variable, input, predictor or regressor.

Explanatory variable

Equivalent to:

Attribute, covariate, exogeneous variable, feature, independent variable, input, predictor or regressor.

F

Feature

Equivalent to:

Attribute, covariate, exogeneous variable, explanatory variable, independent variable, input, predictor or regressor.

Frequentist statistics

This statistical school of thinking heavily relies on the frequency of events to estimate specific parameters of interest in a population or system. This frequency of events is reflected in the repetition of \(n\) experiments involving a random phenomenon within this population or system.

Under the umbrella of this approach, we assume that our governing parameters are fixed. Note that, within the philosophy of this school of thinking, we can only make precise and accurate predictions as long as we repeat our \(n\) experiments as many times as possible, i.e.,

\[ n \rightarrow \infty. \]

G

Generalized linear model

H

Hypothesis testing

I

Independence

Independent variable

Equivalent to:

Attribute, covariate, exogeneous variable, explanatory variable, feature, input, predictor or regressor.

Input

Equivalent to:

Attribute, covariate, exogeneous variable, explanatory variable, feature, independent variable, predictor or regressor.

M

Mean

Equivalent to:

Average or expected value.

Measure of central tendency

Measure of uncertainty

O

Outcome

In supervised learning, it is the main variable of interest we are trying to learn or predict, or equivalently, the variable we are trying explain in a statistical inference framework.

Equivalent to:

Dependent variable, response variable, output or target.

Output

In supervised learning, it is the main variable of interest we are trying to learn or predict, or equivalently, the variable we are trying explain in a statistical inference framework.

Equivalent to:

Dependent variable, response variable, outcome or target.

P

Parameter

It is a characteristic (numerical or even non-numerical, such as a distinctive category) that summarizes the state of our population or system of interest. Examples of a population parameter can be described as follows:

  • The average weight of children between the ages of 5 and 10 years old in states of the American west coast (numerical).
  • The variability in the height of the mature açaí palm trees from the Brazilian Amazonian jungle (numerical).
  • The proportion of defective items in the production of cellular phones in a set of manufacturing facilities (numerical).
  • The average customer waiting time to get their order in the Vancouver franchises of a well-known ice cream parlour (numerical).
  • The most favourite pizza topping of vegetarian adults between the ages of 30 and 40 years old in Edmonton (non-numerical).

Note the standard mathematical notation for population parameters are Greek letters. Moreover, in practice, these population parameter(s) of interest will be unknown to the data scientist or researcher. Instead, they would use formal statistical inference to estimate them.

Population

It is a whole collection of individuals or items that share distinctive attributes. As data scientists or researchers, we are interested in studying these attributes, which we assume are governed by parameters. In practice, we must be as precise as possible when defining our given population such that we would frame our entire data modelling process since its very early stages. Examples of a population could be the following:

  • Children between the ages of 5 and 10 years old in states of the American west coast.
  • Customers of musical vinyl records in the Canadian provinces of British Columbia and Alberta.
  • Avocado trees grown in the Mexican state of Michoacán.
  • Adult giant pandas in the Southwestern Chinese province of Sichuan.
  • Mature açaí palm trees from the Brazilian Amazonian jungle.

Note that the term population could be exchanged for the term system, given that certain contexts do not specifically refer to individuals or items. Instead, these contexts could refer to processes whose attributes are also governed by parameters. Examples of a system could be the following:

  • The production of cellular phones in a set of manufacturing facilities.
  • The sale process in the Vancouver franchises of a well-known ice cream parlour.
  • The transit cycle of the twelve lines of Mexico City’s subway.

Predictor

Equivalent to:

Attribute, covariate, exogeneous variable, explanatory variable, feature, independent variable, input or regressor.

Probability

Let \(A\) be an event of interest in a random phenomenon, in a population or system of interest, whose all possible outcomes belong to a given sample space \(S\). Generally, the probability for this event \(A\) happening can be mathematically depicted as \(P(A)\). Moreover, suppose we observe the random phenomenon \(n\) times such as we were running some class of experiment, then \(P(A)\) is defined as the following ratio:

\[ P(A) = \frac{\text{Number of times event $A$ is observed}}{n}, \tag{A.1}\]

as the \(n\) times we observe the random phenomenon goes to infinity.

Equation A.1 will always put \(P(A)\) in the following numerical range:

\[ 0 \leq P(A) \leq 1. \]

Probability distribution

Probability mass function (PMF)

R

Random Variable

Regression Analysis

Regressor

Equivalent to:

Attribute, covariate, exogeneous variable, explanatory variable, feature, independent variable, input or predictor.

Response variable

In supervised learning, it is the main variable of interest we are trying to learn or predict, or equivalently, the variable we are trying explain in a statistical inference framework.

Equivalent to:

Dependent variable, outcome, output or target.

S

Sample space

Let \(A\) be an event of interest in a random phenomenon in a population or system of interest. The sample space \(S\) of event \(A\) denotes the set of all the possible random outcomes we might encounter every time we randomly observe \(A\) such as we were running some class of experiment.

Note each of these outcomes has a determined probability associated with them. If we add up all these probabilities, the probability of the sample \(S\) will be one, i.e.,

\[ P(S) = 1. \tag{A.2}\]

T

Target

In supervised learning, it is the main variable of interest we are trying to learn or predict, or equivalently, the variable we are trying explain in a statistical inference framework.

Equivalent to:

Dependent variable, response variable, outcome or output.

Training dataset

V

Variance