Hands-on_Ex04(2) - Visual Statistical Analysis

Author

Hoa Nguyen Phuong

Modified

June 17, 2025

1 Learning Outcome

In this hands-on exercise, we will gain hands-on experience on using:

  • ggstatsplot package to create visual graphics with rich statistical information.
  • performance package to visualise model diagnostics.
  • parameters package to visualise model parameters.

2 Visual Statistical Analysis with ggstatsplot

ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the information-rich plots themselves.

- to provide alternative statistical inference methods by default.
- to follow best practices for statistical reporting. For all statistical tests reported in the plots, the default template abides by the [APA](https://my.ilstu.edu/~jhkahn/apastats.html) gold standard for statistical reporting. 
For example, here are results from a robust t-test:

3 Getting started

3.1 Installing and launching R packages

In this exercise, ggstatsplot and tidyverse will be used.

pacman::p_load(ggstatsplot, tidyverse)

3.2 Importing data

DIY

Import Exam-csv data by using appropriate tidyverse package.

exam <- read_csv("data/Exam_data.csv")

3.3 One-sample test: gghistostats() method

In the code chunk below, gghistostats() is used to build an visual of one-sample test on English scores.

Show the code
set.seed(1234)

gghistostats(
  data = exam,
  x = ENGLISH,
  type = "bayes",
  test.value = 60,
  xlab = "English scores"
)

Default information:

statistical details / Bayes Factor / sample sizes / distribution summary

3.4 Unpacking the Bayes Factor

  • A Bayes factor is the ratio of the likelihood of one particular hypothesis to the likelihood of another. It can be interpreted as a measure of the strength of evidence in favour of one theory among two competing theories.

  • That’s because the Bayes factor gives us a way to evaluate the data in favour of a null hypothesis, and to use external information to do so. It tells us what the weight of the evidence is in favour of a given hypothesis.

  • When we are comparing two hypotheses, H1 (the alternate hypothesis) and H0 (the null hypothesis), the Bayes factor is often written as B10. It can be defined mathematically as:

    \[ \frac{likelihood-of-data-given-H_1}{likelihood-of-data-given-H_0} = \frac{P(D|H_1)}{P(D/H_0)} \]

  • The Schwarz criterion is one of the easiest ways to calculate rough estimation of the Bayes factor.

3.5 How to interpret Bayes Factor

A Bayes Factor can be any positive number.

One of the most common interpretation is this one - first proposed by Harold Jeffereys (1961) and slightly modified by Lee and Wagemakers in 2013.

Bayes Factor
IF B10 IS… THEN YOU HAVE
>100 Extreme evidence for H1
30 - 100 Very strong evidence for H1
10 - 30 Strong evidence for H1
3 - 10 Moderate evidence for H1
1 - 3 Anecdotal evidence for H1
1 No evidence
1/3 - 1 Anecdotal evidence for H1
1/3 - 1/10 Moderate evidence for H1
1/10 - 1/30 Strong evidence for H1
1/30 - 1/100 Very Strong evidence for H1
<1/100 Extreme evidence for H1

3.6 Two-sample mean test: ggbetweenstats()

In the code chunk below, ggbetweenstats() is used to build a visual for two-sample mean test of Maths scores by gender.

Show the code
ggbetweenstats(
  data = exam,
  x = GENDER,
  y = MATHS,
  type = "np",
  message = FALSE
)

Default information: statistical details / Bayes factor / samples sizes / distribution summary

3.7 Oneway ANOVA Test: ggbetweentats() method

In the code chunk below, ggbetweenstats() is used to build a visual for One-way ANOVA test on English scores by race.

Show the code
ggbetweenstats(
  data = exam,
  x = RACE,
  y = ENGLISH,
  type = "p",
  mean.ci = TRUE,
  pariwise.comparisons = TRUE,
  pairwise.display = "s",
  p.adjust.methods = "fdr",
  message = FALSE
)

  • “na” -> only non-significant
  • “s” -> only significant
  • “all” -> everything

3.7.1 ggbetweentats() - Summary of tests

Following (between-subjects) tests are carried out for each type of analyses.

TYPE NO. OF GROUPS TEST
Parametric >2 Fisher’s or Welch’s one-way ANOVA
Non-Parametric >2 Kruskal-Wallis one-way ANOVA
Robust >2 Heteroscedastic one-way ANOVA for trimmed means
Bayes Factor >2 Fisher’s ANOVA
Parametric 2 Student’s or Welch’s t-test
Non-Parametric 2 Mann-Whitney U test
Robust 2 Yuen’s test for trimmed means
Bayes Factor 2 Student’s t-test

The following effect sizes (and confidence intervals) are available for each type of test

Summary of multiple pairwise comparison tests supported in ggbetweenstats()

3.8 Significant test of correlation: ggscatterstats()

In the code chunk below, ggscatterstats() is used to build a visual for Significant Test of Correlation between Maths scores and English scores. Practice using various subjects from the data and labeling functions.

Maths x Science; use labels to indicate races scoring 90 for both subjects.

Show the code
ggscatterstats(
  data = exam,
  x = MATHS,
  y = SCIENCE,
  label.var = RACE,
  label.expression = MATHS >= 90 & SCIENCE >= 90,
  marginal = FALSE
)

English x Science; use labels to indicate genders scoring 90 for both subjects.

Show the code
ggscatterstats(
  data = exam,
  x = ENGLISH,
  y = SCIENCE,
  label.var = GENDER,
  label.expression = ENGLISH >= 90 & SCIENCE >= 90,
  marginal = FALSE
)

Show the code
ggscatterstats(
  data = exam,
  x = MATHS,
  y = ENGLISH,
  marginal = FALSE
)

3.9 Significant test of association (dependence): ggbarstats()

Code chunk below, the Maths scores are binned into a 4-class variable with cut()

#load library
library(dplyr)

exam1 <- exam |>
  mutate(MATHS_bins = 
           cut(MATHS,
               breaks = c(0,60,75,85,100)))

ggbarstats() is used to build a visual for Significant Test of Association in the code below.

Usage

ggbarstats(
  data,
  x,
  y,
  counts = NULL,
  type = "parametric",
  paired = FALSE,
  results.subtitle = TRUE,
  label = "percentage",
  label.args = list(alpha = 1, fill = "white"),
  sample.size.label.args = list(size = 4),
  digits = 2L,
  proportion.test = results.subtitle,
  digits.perc = 0L,
  bf.message = TRUE,
  ratio = NULL,
  conf.level = 0.95,
  sampling.plan = "indepMulti",
  fixed.margin = "rows",
  prior.concentration = 1,
  title = NULL,
  subtitle = NULL,
  caption = NULL,
  legend.title = NULL,
  xlab = NULL,
  ylab = NULL,
  ggtheme = ggstatsplot::theme_ggstatsplot(),
  package = "RColorBrewer",
  palette = "Dark2",
  ggplot.component = NULL,
  ...
)

To see English scores by genders.

Show the code
exam_E <- exam |>
  mutate(ENGLISH_bins = 
           cut(ENGLISH,
               breaks = c(0,60,75,85,100)))

ggbarstats(exam_E,
           x = ENGLISH_bins,
           y = GENDER)

To see Science scores by races.

Show the code
exam_S <- exam |>
  mutate(SCIENCE_bins = 
           cut(SCIENCE,
               breaks = c(0,60,75,85,100)))

ggbarstats(exam_S,
           x = SCIENCE_bins,
           y = RACE)

ggbarstats(exam1,
           x = MATHS_bins,
           y = GENDER)