::install_github("wilkelab/ungeviz") devtools
Hands-on_Ex04(3) - Visualising Uncertainty
1 Learning Outcome
In this chapter, we will do hands-on in creating statistical graphics for visualising uncertainty.
- to plot statistics error bars by using
ggplot2
- to plot interactive error bars by combining
ggplot2
,plotly
andDT
- to create advanced using
ggdist
- to create hypothetical outcome plots (HOPs) with
ungeviz
package
2 Getting Started
2.1 Installing and loading packages
The following R packages will be used for this exercise:
tidyverse
: a family of R packages for data science processplotly
: can create interactive plotgganimate
: can create animation plotDT
: can display interactive HTML tablecrosstalk
: to implement cross-widget interactions (currently linked brushing and filtering)ggdist
: to visualise distribution and uncertainty
::p_load(ungeviz, plotly, corsstalk,
pacman
DT, ggdist, ggridges, colorspace, gganimate, tidyverse)
2.2 Importing data
The Exam_data.csv dataset will be used for this exercise.
<- read_csv("data/Exam_data.csv") exam
3 Visualising the uncertainty of point estimates: ggplot2 methods
A point estimate is a single number, such as a mean score. Uncertainty, is expressed as standard error, confidence interval, or credible interval. Donβt confuse the uncertainty of a point estimate with the variation in the sample.
Now, we will plot error bars of MATHS scores by RACE using the data provided in exam tibble data frame.
Code below will be used to derive the necessary summary statistics:
<- exam %>%
my_sum #group the observation by RACE; group_by() from dplyr package
group_by(RACE) %>%
#compute the count of observations, mean, standard deviation
summarise(
n=n(),
mean=mean(MATHS),
sd=sd(MATHS)
%>%
)
#derive standard error of Maths by RACE
mutate(se=sd/sqrt(n-1))
The output is saved as a tibble data table called my_sum.
group_by()
from dplyr package is used to group the observation by RACEsummarise()
is used to compute count of observations, mean, standard deviationmutate()
is used to derive standard error of MATHS by RACE
πππ Next code is used to display my_sum tibble data frame in an HTML table format.
:::kable(head(my_sum),
knitrformat = 'html')
RACE | n | mean | sd | se |
---|---|---|---|---|
Chinese | 193 | 76.50777 | 15.69040 | 1.132357 |
Indian | 12 | 60.66667 | 23.35237 | 7.041005 |
Malay | 108 | 57.44444 | 21.13478 | 2.043177 |
Others | 9 | 69.66667 | 10.72381 | 3.791438 |
3.1 Plotting standard error bars of point estimates
Now, plotting the standard error bars for the mean score of MATHS by RACE.
- The error bars are computed by using the formula
mean +/- se
- For
geom_point()
, it is important to indicatestat="identity"
ggplot(my_sum) +
geom_errorbar(
aes(x = RACE,
ymin=mean-se,
ymax=mean+se),
width = 0.2,
colour = "blue",
alpha = 0.9,
size = 0.5
+
) geom_point(
aes(x = RACE,
y = mean),
stat = "identity",
color = "red",
size = 2.5,
alpha = 1
+
) ggtitle("Standard Error of Mean MATHS Score by RACE")
3.2 Plotting confidence interval of point estimates
Instead of plotting error bars of point estimates, we can also plot Confidence Intervals of the mean scores of MATHS by RACE.
ggplot(my_sum) +
geom_errorbar(
aes(x = reorder(RACE, -mean),
ymin=mean-1.96*se,
ymax=mean+1.96*se),
width=0.2,
colour="darkgreen",
alpha=0.9,
size=1
+
) geom_point(aes(
x = RACE,
y = mean),
stat="identity",
color="red",
alpha=1,
size=5
+
) labs(x = "MATHS score",
title = "95% Confidence Interval of Mean MATHS Score by RACE")
- The Confidence Intervals are computed by using the formula mean +/- 1.96*se
- The error bars are sorted using the average maths scores
labs()
argument ofggplot2
is used to change the x-axis label
3.3 Visualising the uncertainty of point estimates with interactive error bars
To plot interactive error bars for the 99% Confidence Interval of the mean score for MATHS by RACE.
The primary use for SharedData
is to be passed to Crosstalk-compatible widgets in place of a data frame. Each SharedData$new(...)
call makes a new βgroupβ of widgets that link to each other, but not to widgets in other groups. You can also use a SharedData
object from Shiny code in order to react to filtering and brushing from non-widget visualizations (like ggplot2 plots).
#install.packages("leaflet")
library(shiny)
library(crosstalk)
library(leaflet)
library(DT)
= SharedData$new(my_sum)
shared_df
bscols(widths = c(4.5,8),
ggplotly((ggplot(shared_df) +
geom_errorbar(
aes(x = reorder(RACE, -mean),
ymin = mean-2.58*se,
ymax = mean+2.58*se),
width = 0.2,
colour = "blue",
alpha = 0.8,
size = 0.6
+
) geom_point(
aes(x = RACE,
y = mean,
text = paste("Race: ", `RACE`,
"<br>N: ", `n`,
"<br>Avg. Scores: ", round(mean, digits = 2),
"<br>95% CI:[",
round((mean-2.58*se), digits = 2), ",",
round((mean+2.58*se), digits = 2), "]")),
stat = "identity",
color = "pink",
size = 2.5,
alpha = 1) +
xlab("Race") +
ylab("Average Scores") +
theme_minimal() +
theme(axis.text.x = element_text(
angle = 45, vjust = 0.8, hjust = 1),
plot.title = element_text(size = 8, face = "bold")) +
ggtitle("99% Confidence Interval of <br>Average MATHS Score by RACE")),
tooltip = "text"),
::datatable(shared_df,
DTrownames = FALSE,
class = "compact",
width = "150%",
options = list(pageLength = 10,
scrollX=T),
colnames = c("No. of pupils",
"Avg. scores",
"Std Dev",
"Std Error")) %>%
formatRound(columns = c('mean', 'sd', 'se'),
digits = 2))
4 Visualsing Uncertainty: ggdist
package
ggdist
for distribution and uncertainty visualisation:
It is an R package that provides flexible set of ggplot2
geoms and stats designed for visualising distributions and uncertainty.
It can visualise both frequentist and Bayesian uncertainty. Uncertainty visualization can be unified through the perspective of distribution visualization.
- Frequentist model: one visualises confidence distribution or bootstrap distributions (see vignette (βfreq-uncertainty-visβ) ::: column-margin ## Setup for Frequentist uncertainty visualization
Frequentist uncertainty visualization Setup
library(dplyr)
library(tidyr)
library(ggdist)
library(ggplot2)
library(broom)
library(distributional)
theme_set(theme_ggdist())
- Bayesian model: one visualises probability distributions (see
tidyverse
package that builds on top ofggdist
)
4.1 Visualising the uncertainty of point estimates: ggdist
methods (I)
stat_pointinterval()
of ggdist
is used in the code below to build a visualisation to display distribution of MATHS scores by RACE.
%>%
exam ggplot(aes(x= RACE,
y = MATHS)) +
stat_pointinterval(
color = "skyblue"
+
) labs(
title = "Visualising Confidence Intervals of Mean Scores for MATHS",
subtitle = "Mean point + multiple-interval plot"
)
This function comes with many arguments. See next tab for example.
Added the following arguments
- .width = 0.95
- .point = median
- .interval = qi
- color = red
Show the code
theme_set(theme_bw())
%>%
exam ggplot(aes(
x = RACE, y = MATHS)) +
stat_pointinterval(
.width = 0.95,
.point = median,
.interval = qi,
color = "red") +
labs(title = "Visualising Confidence Intervals of Median Scores for MATHS by RACE",
subtitle = "Median point + multiple-interval plot")
4.2 Practice
DIY to show 95% and 99% confidence intervals.
Show the code
%>%
exam ggplot(aes(x = RACE, y = MATHS)) +
stat_pointinterval(
show.legend = FALSE,
.width = c(0.95, 0.99),
aes(interval_color = stat(level)),
point_fill = "grey",
point_colour = "grey",
point_size = 5
+
) #Define colors of the intervals
scale_color_manual(
values = c("steelblue", "pink"),
aesthetics = "interval_color"
+
) labs(
title = "Visualising Confidence Intervals of Mean Scores for MATHS by RACE",
subtitle = "Mean point + multiple-interval plot"
+
) theme(
panel.background = element_rect(fill = "transparent", color = NA),
plot.background = element_rect(fill = "transparent", color = NA),
legend.background = element_rect(fill = "transparent", color = NA)
)
4.3 Visualising the uncertainty of point estimates: ggdist
methods (II)
stat_gradientinterval()
of ggdist
is used in the code below to build a visualisation for displaying distribution of MATHS scores by RACE.
Show the code
%>%
exam ggplot(
aes(x = RACE,
y = MATHS)) +
stat_gradientinterval(
fill = "skyblue",
show.legend = TRUE
+
) labs(
title = "Visualising Confidence Intervals of Mean Score for MATHS by RACE",
subtitle = "Gradient + interval plot"
)
5 Visualising Uncertainty with Hypothetical Outcome Plots (HOPs)
1οΈβ£ Step 1: Install ungeviz
package
::install_github("wilkelab/ungeviz") devtools
2οΈβ£ Step 2: Launch the application in R
library(ungeviz)
Show the code
ggplot(data = exam,
aes(x = factor(RACE), y = MATHS)) +
geom_point(position = position_jitter(
height = 0.3, width = 0.05),
size = 0.6, color = "darkolivegreen", alpha = 0.6) +
geom_hpline(data = sampler(25, group = RACE),
height = 0.6, color = "pink") +
theme_bw() +
transition_states(.draw, 1, 3)
Show the code
#.draw is a generated column indicating the sample draw.
Show the code
ggplot(data = exam,
aes(x = factor(RACE), y = ENGLISH)) +
geom_point(position = position_jitter(
height = 0.3, width = 0.05),
size = 0.5, color = "skyblue", alpha = 0.6) +
geom_hpline(data = sampler(25, group = RACE),
height = 0.6, color = "azure4") +
theme_bw() +
transition_states(.draw, 1, 3)
Show the code
#.draw is a generated column indicating the sample draw.
Show the code
ggplot(data = exam,
aes(x = factor(RACE), y = SCIENCE)) +
geom_point(position = position_jitter(
height = 0.3, width = 0.05),
size = 0.4, color = "tan1", alpha = 0.6) +
geom_hpline(data = sampler(25, group = RACE),
height = 0.6, color = "pink1") +
theme_bw() +
transition_states(.draw, 1, 3)
Show the code
#.draw is a generated column indicating the sample draw.
π Reading resource for HOPs: