4 Unified Association Measures for Mixed-Type Variables

4.1 Introduction: The Challenge of Mixed-Type Data

Real-world datasets rarely consist of purely continuous or purely categorical variables. A typical health survey contains continuous biomarkers, ordinal symptom scales, categorical diagnoses, and binary treatment indicators. A business database combines numeric sales figures, categorical product types, ordinal customer satisfaction ratings, and binary contract status.

A helpful framing question is: How do we measure association across such mixed data in a unified, comparable way?

Chapter 2 established that different variable types require different association measures. But practitioners often face a practical problem: given a dataset with mixed types, how do we compute a single association matrix that allows us to rank relationships, identify anomalies, and prepare for network analysis or further modeling?

This chapter presents a practical framework for unified association measurement. We aim to:

Survey classical type-specific measures and their limitations
Introduce modern alternatives that are more flexible and often more robust
Build a type-aware framework for selecting appropriate measures
Demonstrate implementation on real mixed-type data

4.2 Variable Type Classification

Association is not a single concept. The way we measure relationships depends on the measurement scale of the variables involved. A correlation coefficient makes sense for two continuous variables, but it becomes misleading when applied to categorical or ordinal data. A good descriptive workflow usually begins by classifying variable types and selecting association measures accordingly.

We classify variables into five types:

Type	Examples	Characteristics
Continuous	income, height, temperature	Interval/ratio scale, numeric measurements
Discrete	count data, number of events	Non-negative integers, often right-skewed
Ordinal	education level, Likert scales	Ordered categories, rank structure matters
Categorical	region, industry, color	Unordered labels, no inherent order
Binary	yes/no, 0/1, presence/absence	Two categories, often treatment or outcome indicator

The core goal is comparability: use measures that help us rank associations across mixed types without forcing invalid assumptions.

4.3 Classical Measures by Variable Type

Classical association measures are well-understood, computationally efficient, and interpretable. However, each is designed for specific type combinations. We review the most important measures organized by the types of variables they handle.

4.3.1 Continuous-Continuous: Pearson and Spearman

Pearson’s \(r\) measures linear association between two continuous variables (Pearson 1895; Anderson 1985). It is scale-invariant, interpretable, and widely understood. However, it has significant limitations:

Sensitive to outliers
Assumes bivariate normality for inference (though we focus on description)
Misses nonlinear relationships (two variables can be perfectly dependent with \(r = 0\))

# Pearson correlation on mtcars
mtcars |>
    select(mpg, wt, hp, disp) |>
    cor(method = "pearson") |>
    as.data.frame() |>
    tibble::rownames_to_column(var = "Variable") |>
    gt() |>
    fmt_number(columns = -Variable, decimals = 3)

Variable	mpg	wt	hp	disp
mpg	1.000	−0.868	−0.776	−0.848
wt	−0.868	1.000	0.659	0.888
hp	−0.776	0.659	1.000	0.791
disp	−0.848	0.888	0.791	1.000

Spearman’s \(\rho\) ranks data first, then computes Pearson’s \(r\) on ranks (Spearman 1961). This makes it:

Robust to outliers
Sensitive to monotonic (not just linear) relationships
Suitable for ordinal data

# Compare Pearson and Spearman on data with nonlinear association
set.seed(42)
data_nonlin <- tibble(
  x = seq(0, 4, length.out = 100),
  y = x^2 + rnorm(100, 0, 2)
)

tibble(
  Measure = c("Pearson", "Spearman"),
  r = c(
    cor(data_nonlin$x, data_nonlin$y, method = "pearson"),
    cor(data_nonlin$x, data_nonlin$y, method = "spearman")
  )
) |>
  gt() |>
  fmt_number(columns = r, decimals = 3)

Measure	r
Pearson	0.880
Spearman	0.877

Recommendation: Spearman is often a sensible default for continuous–continuous pairs due to its robustness; Pearson can be helpful when linearity is established and outliers are minimal.

4.3.2 Categorical-Categorical: Cramér’s V

For two categorical variables, Cramér’s V normalizes the chi-square statistic to \([0, 1]\) (David and Cramer 1947):

\[V = \sqrt{\frac{\chi^2}{n(k-1)}}\]

where \(k = \min(\text{rows}, \text{columns})\). This is intuitive and widely used, though it tends to underestimate association in sparse tables.

# Cramér's V for categorical variables
cramers_v <- function(x, y) {
  tab <- table(x, y)
  chi2 <- suppressWarnings(chisq.test(tab, correct = FALSE)$statistic)
  n <- sum(tab)
  k <- min(nrow(tab), ncol(tab))
  sqrt(as.numeric(chi2) / (n * (k - 1)))
}

# Example: cylinder type vs transmission type
mtcars |>
  summarise(V = cramers_v(factor(cyl), factor(am))) |>
  gt() |>
  fmt_number(columns = V, decimals = 3)

V
0.523

4.3.3 Continuous-Categorical: Eta-Squared

When a continuous variable varies across groups defined by a categorical variable, eta-squared (\(\eta^2\)) captures the proportion of variance explained (Fisher 1990; Cohen 2013):

\[\eta^2 = \frac{\text{SS}_{\text{between}}}{\text{SS}_{\text{total}}}.\]

This is directly interpretable as effect size and ranges from 0 to 1.

# Eta-squared: mpg explained by number of cylinders
eta_squared <- function(y, group) {
  df <- tibble(y = y, group = group)
  grand_mean <- mean(df$y, na.rm = TRUE)
  ss_total <- sum((df$y - grand_mean)^2, na.rm = TRUE)
  ss_between <- df |>
    group_by(group) |>
    summarise(n = n(), mean_y = mean(y, na.rm = TRUE), .groups = "drop") |>
    mutate(ss = n * (mean_y - grand_mean)^2) |>
    summarise(ss = sum(ss)) |>
    pull(ss)
  ss_between / ss_total
}

mtcars |>
  summarise(eta2 = eta_squared(mpg, factor(cyl))) |>
  gt() |>
  fmt_number(columns = eta2, decimals = 3)

eta2
0.732

4.3.4 Ordinal Variables: Kendall’s Tau

Ordinal variables carry rank information that is often best retained. Kendall’s \(\tau\) is a natural choice for ordinal–ordinal associations and is robust to ties (Kendall 1938). Spearman’s \(\rho\) can also be used for ordinal data.

# Ordinal example using a cut of mpg
mtcars |>
  mutate(mpg_ordinal = cut(mpg, breaks = 4, ordered_result = TRUE)) |>
  summarise(
    spearman = cor(as.numeric(mpg_ordinal), wt, method = "spearman"),
    kendall = cor(as.numeric(mpg_ordinal), wt, method = "kendall")
  ) |>
  gt() |>
  fmt_number(columns = c(spearman, kendall), decimals = 3)

spearman	kendall
−0.777	−0.656

4.3.5 Binary-Binary: Phi Coefficient

Binary variables can be treated as a special case of categorical data. For two binary indicators, the phi coefficient is equivalent to Pearson correlation on 0/1 coding.

# Phi coefficient for two binary variables
phi <- function(x, y) {
  cor(as.integer(x), as.integer(y))
}

mtcars |>
  mutate(am_bin = am == 1, vs_bin = vs == 1) |>
  summarise(phi = phi(am_bin, vs_bin)) |>
  gt() |>
  fmt_number(columns = phi, decimals = 3)

phi
0.168

4.4 Modern Alternatives: Beyond Classical Measures

Classical measures work well within their designed domains but lack flexibility. We present modern association measures that offer broader applicability and often improved robustness.

4.4.1 Distance Correlation

Distance correlation (Székely, Rizzo, and Bakirov 2007) is a measure of dependence that:

Works for any dimension (univariate, multivariate)
Detects both linear and nonlinear associations
Equals zero if and only if the variables are independent
Is not sensitive to outliers in the same way as Pearson’s \(r\)

The distance correlation is computed from distance matrices of the two variables by centering these matrices and computing their correlation.

# Distance correlation via the energy package
library(energy)
  
  # Example
  mtcars |>
    summarise(
      dcor = dcor(mpg, wt),
      pearson = cor(mpg, wt, method = "pearson")
    ) |>
    gt() |>
    fmt_number(columns = c(dcor, pearson), decimals = 3)

dcor	pearson
0.871	−0.868

Practical advantages:

Captures nonlinear relationships
Robust to outliers
Theoretically appealing (zero iff independent)

Disadvantages:

Computationally expensive for large samples
Less interpretable than Pearson’s \(r\) or Spearman’s \(\rho\)

4.4.2 Maximal Information Coefficient (MIC)

The Maximal Information Coefficient (Reshef et al. 2011) measures association by finding the grid partition of the data that maximizes mutual information. It detects diverse functional relationships (linear, nonlinear, exponential, etc.) and is normalized to \([0, 1]\).

Strengths:

Detects diverse functional relationships
Symmetric and normalized
No assumption about relationship shape

Weaknesses:

Computationally demanding
Can be noisy with small sample sizes
Requires specialized software

# MIC example (requires the minerva package)
  library(minerva)
  
  # Compute MIC for a pair
  set.seed(42)
  x <- rnorm(200)
  y <- x^2 + rnorm(200, 0, 0.5)
  
  mic_result <- mine(x, y)
  tibble(
    Measure = "MIC",
    Value = mic_result$MIC
  ) |>
    gt() |>
    fmt_number(columns = Value, decimals = 3)

Measure	Value
MIC	0.729

4.4.3 Mutual Information

Mutual information (MI) quantifies the amount of information one variable contains about another (Shannon 1948; Cover and Thomas 2001):

\[I(X; Y) = \sum_x \sum_y p(x, y) \log \frac{p(x, y)}{p(x) p(y)}.\]

Mutual information is:

Symmetric and non-negative
Works for any combination of variable types
Invariant to monotonic transformations
Zero if and only if variables are independent

Interpretation challenge: MI is measured in “nats” or “bits” and lacks an intuitive 0-1 scale, making comparisons across variable pairs difficult without normalization.

# Normalized mutual information
mutual_info_norm <- function(x, y) {
  # Discretize continuous variables if needed
  x_cat <- if (is.numeric(x)) cut(x, breaks = 5) else factor(x)
  y_cat <- if (is.numeric(y)) cut(y, breaks = 5) else factor(y)
  
  tab <- table(x_cat, y_cat)
  px <- rowSums(tab) / sum(tab)
  py <- colSums(tab) / sum(tab)
  pxy <- tab / sum(tab)
  
  # Mutual information
  mi <- 0
  for (i in 1:nrow(pxy)) {
    for (j in 1:ncol(pxy)) {
      if (pxy[i, j] > 0) {
        mi <- mi + pxy[i, j] * log(pxy[i, j] / (px[i] * py[j]))
      }
    }
  }
  
  # Normalize by entropy
  hx <- -sum(px * log(px + 1e-10))
  hy <- -sum(py * log(py + 1e-10))
  2 * mi / (hx + hy)
}

# Example
mtcars |>
  summarise(
    nmi = mutual_info_norm(mpg, wt)
  ) |>
  gt() |>
  fmt_number(columns = nmi, decimals = 3)

nmi
0.576

4.4.4 Copula-Based Measures

Copulas separate marginal distributions from the dependence structure (Sklar 1959). Common copula-derived measures include Kendall’s tau and Spearman’s rho, which are rank-based and capture dependence structure independent of marginal scales.

# Kendall's tau and Spearman's rho
mtcars |>
  summarise(
    kendall_tau = cor(mpg, wt, method = "kendall"),
    spearman_rho = cor(mpg, wt, method = "spearman"),
    pearson_r = cor(mpg, wt, method = "pearson")
  ) |>
  gt() |>
  fmt_number(columns = everything(), decimals = 3)

kendall_tau	spearman_rho	pearson_r
−0.728	−0.886	−0.868

4.5 A Type-Aware Framework for Mixed Data

Given the diversity of measures, we need a practical approach that applies the most appropriate measure to each variable pair based on their types.

4.5.1 Association Measure Selection Matrix

The table below summarizes recommended measures for each type combination. This consolidated matrix serves as the primary reference for selecting appropriate measures:

X → Y	Continuous	Discrete	Ordinal	Categorical	Binary
Continuous	Spearman (default), Pearson, distance correlation	Spearman, distance correlation	Spearman, polyserial	\(\eta^2\), distance correlation	Point-biserial, \(\eta^2\)
Discrete	Spearman, distance correlation	Spearman	Spearman	\(\eta^2\), distance correlation	\(\eta^2\)
Ordinal	Spearman, polyserial	Spearman	Kendall’s \(\tau\), Spearman	Polychoric, Cramér’s V	Rank-biserial
Categorical	\(\eta^2\), distance correlation	\(\eta^2\), distance correlation	Polychoric, Cramér’s V	Cramér’s V, mutual information	\(\phi\), Cramér’s V
Binary	Point-biserial, \(\eta^2\)	\(\eta^2\)	Rank-biserial	\(\phi\), Cramér’s V	\(\phi\)

Rationale for selections:

Continuous-Continuous: Use Spearman as default (robust to outliers); add distance correlation to detect nonlinearity
Continuous-Categorical: Use \(\eta^2\) to measure variance explained by group membership
Categorical-Categorical: Use Cramér’s V (normalized chi-square)
Binary pairs: Use phi coefficient (equivalent to Pearson on 0/1 encoding)
Mixed discrete/ordinal: Prefer rank-based measures

4.5.2 Implementation: Type-Aware Association Matrix

We now implement a complete workflow that classifies variables, selects appropriate measures, and computes a unified association matrix.

# Define automatic variable classification
classify_variable <- function(x, n_unique_threshold = 10) {
  # Remove NA for classification
  x_clean <- x[!is.na(x)]
  n_unique <- n_distinct(x_clean)
  
  if (is.numeric(x)) {
    if (all(x_clean == as.integer(x_clean)) && min(x_clean, na.rm = TRUE) >= 0) {
      if (n_unique <= n_unique_threshold) "ordinal" else "discrete"
    } else {
      "continuous"
    }
  } else if (is.factor(x) || is.character(x)) {
    if (n_unique == 2) "binary" else "categorical"
  } else {
    "unknown"
  }
}

# Define type-aware association computation
compute_association <- function(x, y, type_x, type_y) {
  # Remove pairs with missing data
  valid_idx <- !is.na(x) & !is.na(y)
  x <- x[valid_idx]
  y <- y[valid_idx]
  
  if (length(x) == 0) return(NA_real_)
  
  # Handle numeric conversion for type checking
  if (is.logical(x)) x <- as.integer(x)
  if (is.logical(y)) y <- as.integer(y)
  
  # Continuous–Continuous, Continuous–Discrete, or Continuous–Ordinal
  if ((type_x %in% c("continuous", "discrete")) && 
      (type_y %in% c("continuous", "discrete", "ordinal"))) {
    return(cor(x, as.numeric(y), method = "spearman", use = "complete.obs"))
  }
  
  if ((type_y %in% c("continuous", "discrete")) && 
      (type_x %in% c("continuous", "discrete", "ordinal"))) {
    return(cor(as.numeric(x), y, method = "spearman", use = "complete.obs"))
  }
  
  # Continuous–Categorical or Continuous–Binary (Eta-squared)
  if ((type_x == "continuous" && type_y %in% c("categorical", "binary")) ||
      (type_y == "continuous" && type_x %in% c("categorical", "binary"))) {
    if (type_x %in% c("categorical", "binary")) {
      return(eta_squared(y, x))
    } else {
      return(eta_squared(x, y))
    }
  }
  
  # Categorical–Categorical, Categorical–Binary, or Ordinal–Categorical
  if ((type_x %in% c("categorical", "binary", "ordinal")) && 
      (type_y %in% c("categorical", "binary", "ordinal"))) {
    return(cramers_v(factor(x), factor(y)))
  }
  
  return(NA_real_)
}

# Prepare mtcars with mixed types
mtcars_mixed <- mtcars |>
  mutate(
    cyl_cat = factor(cyl),
    am_bin = factor(am),
    vs_bin = factor(vs),
    gear_cat = factor(gear)
  ) |>
  select(mpg, wt, hp, cyl_cat, am_bin, gear_cat)

# Classify variables
var_types <- tibble(
  var_name = names(mtcars_mixed),
  var_type = map_chr(mtcars_mixed, classify_variable)
)

var_types |>
  gt()

var_name	var_type
mpg	continuous
wt	continuous
hp	discrete
cyl_cat	categorical
am_bin	binary
gear_cat	categorical

# Compute association matrix
vars <- names(mtcars_mixed)
n_vars <- length(vars)
assoc_matrix <- matrix(1.0, nrow = n_vars, ncol = n_vars)
rownames(assoc_matrix) <- vars
colnames(assoc_matrix) <- vars

for (i in 1:n_vars) {
  for (j in i:n_vars) {
    if (i == j) {
      assoc_matrix[i, j] <- 1.0
    } else {
      assoc <- compute_association(
        mtcars_mixed[[i]], mtcars_mixed[[j]],
        var_types$var_type[i], var_types$var_type[j]
      )
      assoc_matrix[i, j] <- assoc
      assoc_matrix[j, i] <- assoc
    }
  }
}

# Display association matrix as table
assoc_matrix |>
  as.data.frame() |>
  tibble::rownames_to_column(var = "Variable") |>
  gt() |>
  fmt_number(columns = -Variable, decimals = 3)

Variable	mpg	wt	hp	cyl_cat	am_bin	gear_cat
mpg	1.000	−0.886	−0.895	0.732	0.360	0.429
wt	−0.886	1.000	0.775	0.612	0.480	0.434
hp	−0.895	0.775	1.000	NA	NA	NA
cyl_cat	0.732	0.612	NA	1.000	0.523	0.531
am_bin	0.360	0.480	NA	0.523	1.000	0.809
gear_cat	0.429	0.434	NA	0.531	0.809	1.000

# Display as heatmap
assoc_matrix |>
  as.data.frame() |>
  tibble::rownames_to_column(var = "x") |>
  pivot_longer(-x, names_to = "y", values_to = "assoc") |>
  ggplot(aes(x = x, y = y, fill = assoc)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "#3b4cc0", mid = "white", high = "#b40426", 
                       limits = c(0, 1), breaks = seq(0, 1, 0.25)) +
  labs(
    title = "Type-Aware Association Heatmap",
    x = NULL, y = NULL, fill = "Association"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

4.6 Normalization and Comparability

Different measures operate on different scales. Some are naturally bounded to \([0, 1]\); others are unbounded. To compare associations across variable pairs, we often need to normalize.

4.6.1 Standard Normalization Approaches

Min-max scaling: Scale a measure to \([0, 1]\) using its theoretical bounds \[\text{Normalized} = \frac{\text{Measure} - \min}{\max - \min}\]
Absolute value: For measures that can be negative (e.g., correlation), use absolute value if directionality is not of interest \[\text{Normalized} = |\text{Measure}|\]
Fisher z-transformation: For Pearson’s \(r\) specifically, use \(z = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right)\) to stabilize variance
Mutual information normalization: Divide by maximum possible MI or entropy \[\text{Normalized MI} = \frac{I(X;Y)}{\min(H(X), H(Y))}\]

4.6.2 Important Caveats

Normalization enables ranking but not direct comparison across measure types. A normalized value of 0.6 for Cramér’s V is not equivalent to 0.6 for Spearman’s \(\rho\). It helps to:

Document your normalization choice
Interpret results within measure families first
Use normalization to identify anomalies and ranking, not to claim equivalence

# Example: normalize association matrix to [0, 1]
assoc_matrix_norm <- assoc_matrix
diag(assoc_matrix_norm) <- NA  # Exclude diagonal (all 1s)

# Normalize each off-diagonal value
min_assoc <- min(assoc_matrix_norm, na.rm = TRUE)
max_assoc <- max(assoc_matrix_norm, na.rm = TRUE)

assoc_matrix_norm <- (assoc_matrix_norm - min_assoc) / (max_assoc - min_assoc)
diag(assoc_matrix_norm) <- 1  # Restore diagonal

# Heatmap of normalized values
assoc_matrix_norm |>
  as.data.frame() |>
  tibble::rownames_to_column(var = "x") |>
  pivot_longer(-x, names_to = "y", values_to = "assoc") |>
  filter(x != y) |>  # Exclude diagonal for clarity
  ggplot(aes(x = x, y = y, fill = assoc)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(direction = -1, na.value = "white") +
  labs(
    title = "Normalized Association Matrix",
    x = NULL, y = NULL, fill = "Normalized\nAssociation"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

4.7 Handling Special Cases

4.7.1 Missing Data

Associations with missing data require explicit handling:

Listwise deletion: Remove all rows with any missing values (simple but wasteful)
Pairwise deletion: Compute each association only from complete cases for that pair (more data, but samples vary by pair)
Imputation: Replace missing values with estimates before computing associations

# Example: comparing pairwise deletion methods
mtcars_with_na <- mtcars |>
  mutate(
    mpg = ifelse(row_number() %in% c(5, 10), NA, mpg),
    wt = ifelse(row_number() %in% c(3, 8), NA, wt)
  )

tibble(
  Method = c("Listwise", "Pairwise (mpg-wt)"),
  N = c(
    nrow(na.omit(mtcars_with_na[, c("mpg", "wt")])),
    sum(!is.na(mtcars_with_na$mpg) & !is.na(mtcars_with_na$wt))
  ),
  Correlation = c(
    cor(na.omit(mtcars_with_na[, c("mpg", "wt")])[, 1],
        na.omit(mtcars_with_na[, c("mpg", "wt")])[, 2]),
    cor(mtcars_with_na$mpg, mtcars_with_na$wt, use = "complete.obs")
  )
) |>
  gt() |>
  fmt_number(columns = c(Correlation), decimals = 3)

Method	N	Correlation
Listwise	28	−0.875
Pairwise (mpg-wt)	28	−0.875

4.7.2 Outliers and Robustness

Outliers can distort Pearson correlations but have less impact on rank-based measures. When outliers are present:

Use Spearman or Kendall instead of Pearson
Inspect scatterplots for unusual patterns
Consider robust alternatives (winsorization, trimmed correlation)

# Impact of outliers
set.seed(42)
x <- rnorm(50)
y <- 0.7 * x + rnorm(50, 0, 0.5)

# Add outliers
x_outlier <- c(x, 10, 10)
y_outlier <- c(y, -8, 12)

tibble(
  Scenario = c("No outliers", "With outliers"),
  Pearson = c(
    cor(x, y, method = "pearson"),
    cor(x_outlier, y_outlier, method = "pearson")
  ),
  Spearman = c(
    cor(x, y, method = "spearman"),
    cor(x_outlier, y_outlier, method = "spearman")
  )
) |>
  gt() |>
  fmt_number(columns = c(Pearson, Spearman), decimals = 3)

Scenario	Pearson	Spearman
No outliers	0.840	0.845
With outliers	0.310	0.751

4.8 Practical Workflow Summary

Here is a step-by-step workflow for building a unified association matrix:

Classify variables by type (continuous, discrete, ordinal, categorical, binary)
Handle missing data explicitly (document your choice)
Inspect distributions for outliers, skewness, and multimodality
Select measures based on the type-aware selection matrix above
Compute pairwise associations with appropriate measures
Normalize if comparability across pairs is desired
Visualize as a heatmap or network (see later chapters)
Interpret in context (avoid overstating quantitative results)

4.9 Summary and Key Takeaways

Variable types matter: Different measurement scales require different association measures to avoid misleading results.
Classical measures (Pearson, Spearman, Cramér’s V, \(\eta^2\)) are efficient, interpretable, and widely understood, but each is specialized for specific type combinations.
Modern alternatives (distance correlation, MIC, mutual information) offer greater flexibility and can detect nonlinear relationships, but are often more computationally expensive and less interpretable.
Type-aware framework: A systematic approach to selecting appropriate measures based on variable type combinations ensures valid and comparable association estimates.
Normalization enables ranking but does not erase conceptual differences between measures. It helps to document our choices and interpret results carefully.
Special cases (missing data, outliers) require explicit handling. Rank-based measures are more robust than Pearson correlation to outliers.

4.10 Looking Ahead

With a unified framework for association measurement now in hand, the next chapter extends this work to nonlinear and conditional associations. We plan to explore copulas, partial correlations, and graphical models to capture more complex dependence structures. Then, we move to network representations that visualize multivariate association patterns in a format that stakeholders can interpret and act upon.

Anderson, T. W. 1985. “An Introduction to Multivariate Statistical Analysis, 2nd Edition.” Biometrics 41 (3): 815. https://doi.org/10.2307/2531310.

Cohen, Jacob. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge. https://doi.org/10.4324/9780203771587.

Cover, Thomas M., and Joy A. Thomas. 2001. Elements of Information Theory. Wiley. https://doi.org/10.1002/0471200611.

David, F. N., and H. Cramer. 1947. “Mathematical Methods of Statistics.” Biometrika 34 (3/4): 374. https://doi.org/10.2307/2332454.

Fisher, Sir Ronald A. 1990. “Statistical Methods for Research Workers.” Statistical Methods, Experimental Design, and Scientific Inference. Oxford University PressOxford. https://doi.org/10.1093/oso/9780198522294.002.0003.

Kendall, M. G. 1938. “A New Measure of Rank Correlation.” Biometrika 30 (1/2): 81. https://doi.org/10.2307/2332226.

Pearson, Karl. 1895. “VII. Note on Regression and Inheritance in the Case of Two Parents.” Proceedings of the Royal Society of London 58 (347–352): 240–42. https://doi.org/10.1098/rspl.1895.0041.

Reshef, David N., Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti. 2011. “Detecting Novel Associations in Large Data Sets.” Science 334 (6062): 1518–24. https://doi.org/10.1126/science.1205438.

Shannon, C. E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

Sklar, M. 1959. “Fonctions de répartition à n Dimensions Et Leurs Marges.” In Annales de l’ISUP, 8:229–31. 3.

Spearman, C. 1961. “The Proof and Measurement of Association Between Two Things.” In Studies in Individual Differences: The Search for Intelligence., 45–58. Appleton-Century-Crofts. https://doi.org/10.1037/11491-005.

Székely, Gábor J., Maria L. Rizzo, and Nail K. Bakirov. 2007. “Measuring and Testing Dependence by Correlation of Distances.” The Annals of Statistics 35 (6). https://doi.org/10.1214/009053607000000505.