Hands-on_Ex04(3) - Visualising Uncertainty

Author

Hoa Nguyen Phuong

Modified

June 17, 2025

1 Learning Outcome

In this chapter, we will do hands-on in creating statistical graphics for visualising uncertainty.

to plot statistics error bars by using ggplot2
to plot interactive error bars by combining ggplot2, plotly and DT
to create advanced using ggdist
to create hypothetical outcome plots (HOPs) with ungeviz package

2 Getting Started

2.1 Installing and loading packages

The following R packages will be used for this exercise:

tidyverse: a family of R packages for data science process
plotly: can create interactive plot
gganimate: can create animation plot
DT: can display interactive HTML table
crosstalk: to implement cross-widget interactions (currently linked brushing and filtering)
ggdist: to visualise distribution and uncertainty

devtools::install_github("wilkelab/ungeviz")

pacman::p_load(ungeviz, plotly, corsstalk,
               DT, ggdist, ggridges,
               colorspace, gganimate, tidyverse)

2.2 Importing data

The Exam_data.csv dataset will be used for this exercise.

exam <- read_csv("data/Exam_data.csv")

3 Visualising the uncertainty of point estimates: ggplot2 methods

A point estimate is a single number, such as a mean score. Uncertainty, is expressed as standard error, confidence interval, or credible interval. Don’t confuse the uncertainty of a point estimate with the variation in the sample.

Now, we will plot error bars of MATHS scores by RACE using the data provided in exam tibble data frame.

Code below will be used to derive the necessary summary statistics:

Code
Mathematics

my_sum <- exam %>%
  #group the observation by RACE; group_by() from dplyr package
  group_by(RACE) %>% 
  
  #compute the count of observations, mean, standard deviation
  summarise( 
    n=n(),
    mean=mean(MATHS),
    sd=sd(MATHS)
  ) %>%
  
  #derive standard error of Maths by RACE
  mutate(se=sd/sqrt(n-1))

The output is saved as a tibble data table called my_sum.

Summary of code above

group_by() from dplyr package is used to group the observation by RACE
summarise() is used to compute count of observations, mean, standard deviation
mutate() is used to derive standard error of MATHS by RACE

🔑🔑🔑 Next code is used to display my_sum tibble data frame in an HTML table format.

Code
Table

knitr:::kable(head(my_sum),
              format = 'html')

RACE	n	mean	sd	se
Chinese	193	76.50777	15.69040	1.132357
Indian	12	60.66667	23.35237	7.041005
Malay	108	57.44444	21.13478	2.043177
Others	9	69.66667	10.72381	3.791438

3.1 Plotting standard error bars of point estimates

Now, plotting the standard error bars for the mean score of MATHS by RACE.

Plot
Code

Summary of code above

The error bars are computed by using the formula mean +/- se
For geom_point(), it is important to indicate stat="identity"

ggplot(my_sum) + 
  geom_errorbar(
    aes(x = RACE,
        ymin=mean-se,
        ymax=mean+se),
    width = 0.2,
    colour = "blue",
    alpha = 0.9,
    size = 0.5
  ) +
  geom_point(
    aes(x = RACE,
        y = mean),
    stat = "identity",
    color = "red",
    size = 2.5,
    alpha = 1
  ) +
  ggtitle("Standard Error of Mean MATHS Score by RACE")

3.2 Plotting confidence interval of point estimates

Instead of plotting error bars of point estimates, we can also plot Confidence Intervals of the mean scores of MATHS by RACE.

Plot
Code

ggplot(my_sum) +
  geom_errorbar(
    aes(x = reorder(RACE, -mean),
        ymin=mean-1.96*se,
        ymax=mean+1.96*se),
    width=0.2,
    colour="darkgreen",
    alpha=0.9,
    size=1
  ) + 
  geom_point(aes(
    x = RACE,
    y = mean),
    stat="identity",
    color="red",
    alpha=1,
    size=5
    ) +
  labs(x = "MATHS score",
       title = "95% Confidence Interval of Mean MATHS Score by RACE")

Summary of code

The Confidence Intervals are computed by using the formula mean +/- 1.96*se
The error bars are sorted using the average maths scores
labs() argument of ggplot2 is used to change the x-axis label

3.3 Visualising the uncertainty of point estimates with interactive error bars

To plot interactive error bars for the 99% Confidence Interval of the mean score for MATHS by RACE.

The primary use for SharedData is to be passed to Crosstalk-compatible widgets in place of a data frame. Each SharedData$new(...) call makes a new “group” of widgets that link to each other, but not to widgets in other groups. You can also use a SharedData object from Shiny code in order to react to filtering and brushing from non-widget visualizations (like ggplot2 plots).

Plot
Code

#install.packages("leaflet")
library(shiny)
library(crosstalk)
library(leaflet)
library(DT)

shared_df = SharedData$new(my_sum)

bscols(widths = c(4.5,8),
       ggplotly((ggplot(shared_df) +
                   geom_errorbar(
                     aes(x = reorder(RACE, -mean),
                         ymin = mean-2.58*se,
                         ymax = mean+2.58*se),
                     width = 0.2,
                     colour = "blue",
                     alpha = 0.8,
                     size = 0.6
                   ) +
                   geom_point(
                     aes(x = RACE,
                         y = mean,
                         text = paste("Race: ", `RACE`,
                                      "<br>N: ", `n`,
                                      "<br>Avg. Scores: ", round(mean, digits = 2),
                                      "<br>95% CI:[",
                                      round((mean-2.58*se), digits = 2), ",",
                                      round((mean+2.58*se), digits = 2), "]")),
                        stat = "identity",
                        color = "pink",
                        size = 2.5,
                        alpha = 1) +
                   xlab("Race") +
                   ylab("Average Scores") +
                   theme_minimal() +
                   theme(axis.text.x = element_text(
                     angle = 45, vjust = 0.8, hjust = 1),
                     plot.title = element_text(size = 8, face = "bold")) +
                   ggtitle("99% Confidence Interval of <br>Average MATHS Score by RACE")),
                tooltip = "text"),
       DT::datatable(shared_df,
                     rownames = FALSE,
                     class = "compact",
                     width = "150%",
                     options = list(pageLength = 10,
                                    scrollX=T),
                     colnames = c("No. of pupils",
                                  "Avg. scores",
                                  "Std Dev",
                                  "Std Error")) %>%
         formatRound(columns = c('mean', 'sd', 'se'),
                     digits = 2))

4 Visualsing Uncertainty: `ggdist` package

ggdist for distribution and uncertainty visualisation:

It is an R package that provides flexible set of ggplot2 geoms and stats designed for visualising distributions and uncertainty.

It can visualise both frequentist and Bayesian uncertainty. Uncertainty visualization can be unified through the perspective of distribution visualization.

Frequentist model: one visualises confidence distribution or bootstrap distributions (see vignette (“freq-uncertainty-vis”) ::: column-margin ## Setup for Frequentist uncertainty visualization

Frequentist uncertainty visualization Setup

library(dplyr)
library(tidyr)
library(ggdist)
library(ggplot2)
library(broom)
library(distributional)

theme_set(theme_ggdist())

Bayesian model: one visualises probability distributions (see tidyverse package that builds on top of ggdist)

4.1 Visualising the uncertainty of point estimates: `ggdist` methods (I)

stat_pointinterval() of ggdist is used in the code below to build a visualisation to display distribution of MATHS scores by RACE.

exam %>%
  ggplot(aes(x= RACE,
             y = MATHS)) +
  stat_pointinterval(
    color = "skyblue"
  ) +
  labs(
    title = "Visualising Confidence Intervals of Mean Scores for MATHS",
    subtitle = "Mean point + multiple-interval plot"
  )

Note

This function comes with many arguments. See next tab for example.

Added the following arguments

.width = 0.95
.point = median
.interval = qi
color = red

Show the code

theme_set(theme_bw())

exam %>%
  ggplot(aes(
    x = RACE, y = MATHS)) +
  stat_pointinterval(
    .width = 0.95,
    .point = median,
    .interval = qi,
    color = "red") +
  labs(title = "Visualising Confidence Intervals of Median Scores for MATHS by RACE",
       subtitle = "Median point + multiple-interval plot")

4.2 Practice

Tip

DIY to show 95% and 99% confidence intervals.

Show the code

exam %>%
ggplot(aes(x = RACE, y = MATHS)) +
  stat_pointinterval(
    show.legend = FALSE,
    .width = c(0.95, 0.99),
    aes(interval_color = stat(level)),
    point_fill = "grey",
    point_colour = "grey",
    point_size = 5
  ) +
  #Define colors of the intervals
  scale_color_manual(
    values = c("steelblue", "pink"),
    aesthetics = "interval_color"
  ) +
  labs(
    title = "Visualising Confidence Intervals of Mean Scores for MATHS by RACE",
    subtitle = "Mean point + multiple-interval plot"
  ) +
  theme(
    panel.background = element_rect(fill = "transparent", color = NA),
    plot.background = element_rect(fill = "transparent", color = NA),
    legend.background = element_rect(fill = "transparent", color = NA)
  )

4.3 Visualising the uncertainty of point estimates: `ggdist` methods (II)

stat_gradientinterval() of ggdist is used in the code below to build a visualisation for displaying distribution of MATHS scores by RACE.

Show the code

exam %>%
  ggplot(
    aes(x = RACE,
        y = MATHS)) +
  stat_gradientinterval(
    fill = "skyblue",
    show.legend = TRUE
  ) +
  labs(
    title = "Visualising Confidence Intervals of Mean Score for MATHS by RACE",
    subtitle = "Gradient + interval plot"
  )

5 Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

1️⃣ Step 1: Install ungeviz package

devtools::install_github("wilkelab/ungeviz")

2️⃣ Step 2: Launch the application in R

library(ungeviz)

Show the code

ggplot(data = exam,
       aes(x = factor(RACE), y = MATHS)) +
  geom_point(position = position_jitter(
    height = 0.3, width = 0.05),
    size = 0.6, color = "darkolivegreen", alpha = 0.6) +
  geom_hpline(data = sampler(25, group = RACE),
              height = 0.6, color = "pink") +
  theme_bw() +
  transition_states(.draw, 1, 3)

Show the code

#.draw is a generated column indicating the sample draw.

Show the code

ggplot(data = exam,
       aes(x = factor(RACE), y = ENGLISH)) +
  geom_point(position = position_jitter(
    height = 0.3, width = 0.05),
    size = 0.5, color = "skyblue", alpha = 0.6) +
  geom_hpline(data = sampler(25, group = RACE),
              height = 0.6, color = "azure4") +
  theme_bw() +
  transition_states(.draw, 1, 3)

Show the code

#.draw is a generated column indicating the sample draw.

Show the code

ggplot(data = exam,
       aes(x = factor(RACE), y = SCIENCE)) +
  geom_point(position = position_jitter(
    height = 0.3, width = 0.05),
    size = 0.4, color = "tan1", alpha = 0.6) +
  geom_hpline(data = sampler(25, group = RACE),
              height = 0.6, color = "pink1") +
  theme_bw() +
  transition_states(.draw, 1, 3)

Show the code

#.draw is a generated column indicating the sample draw.

📖 Reading resource for HOPs:

https://medium.com/hci-design-at-uw/hypothetical-outcomes-plots-experiencing-the-uncertain-b9ea60d7c740

1 Learning Outcome

2 Getting Started

2.1 Installing and loading packages

2.2 Importing data

3 Visualising the uncertainty of point estimates: ggplot2 methods

3.1 Plotting standard error bars of point estimates

3.2 Plotting confidence interval of point estimates

3.3 Visualising the uncertainty of point estimates with interactive error bars

4 Visualsing Uncertainty: ggdist package

4.1 Visualising the uncertainty of point estimates: ggdist methods (I)

4.2 Practice

4.3 Visualising the uncertainty of point estimates: ggdist methods (II)

5 Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)

4 Visualsing Uncertainty: `ggdist` package

4.1 Visualising the uncertainty of point estimates: `ggdist` methods (I)

4.3 Visualising the uncertainty of point estimates: `ggdist` methods (II)