pacman::p_load(tidyverse, FunnelPlotR, plotly, knitr)Hands-on_Ex04(4) - Funnel Plots for Fair Comparisons
1 Overview
Funnel plot is a specially designed data visualisation for unbiased comparison between outlets, stores or business entities. We will learn the following in this chapter:
- Plot funnel plots using
funnelPlotRpackage - Plot static funnel plot with
ggplot2package - plot interactive funnel plot by using both
plotly Randggplot2packages
2 Install and Launch R Packages
Four R packages will be used in this exercise:
- readr: import csv to R
- FunnerPlotR: create funnel plot
- ggplot2: create funner plot manually
- knitr: build static html table
- plotly: create interactive funner plot
3 Import Data
We use a new set of data for this exercise - COVID-19_DKI_Jakarta as of 31st July 2021 from Open Data Covid-19 Provinsi DKI Jakarta portal. In this exercise, we will compare the cumulative COVID-19 cases and death by sub-district (i.e. kelurahan).
First, we import the data into R and save it into a tibble data frame object called covid19.
covid19 <- read_csv("data/COVID-19_DKI_Jakarta.csv") %>%
mutate_if(is.character, as.factor)| Sub-district ID | City | District | Sub-district | Positive | Recovered | Death |
| 3172051003 | JAKARTA UTARA | PADEMANGAN | ANCOL | 1776 | 1691 | 26 |
| 3173041007 | JAKARTA BARAT | TAMBORA | ANGKE | 1783 | 1720 | 29 |
| 3175041005 | JAKARTA TIMUR | KRAMAT JATI | BALE KAMBANG | 2049 | 1964 | 31 |
| 3175031003 | JAKARTA TIMUR | JATINEGARA | BALI MESTER | 827 | 797 | 13 |
| 3175101006 | JAKARTA TIMUR | CIPAYUNG | BAMBU APUS | 2866 | 2792 | 27 |
| 3174031002 | JAKARTA SELATAN | MAMPANG PRAPATAN | BANGKA | 1828 | 1757 | 26 |
4 FunnelPlotR Methods
FunnelPlotR package uses ggplot to generate funner plots. It requires numerator (events of interest), denominator (population considered) and group. The key arguments selected for customisation are:
FunnelPlotR Installation
install.packages("FunnelPlotR")limit: plot limits (95 or 99)label_outliers: to label outliers (true or false)Poisson_limits: to add Poisson limits to the plotOD_adjust: to add overdispersed limits to the plotxrangeandyrange: to specify the range to display for axes, acts like a zoom function- Other aesthetic components, such as
graph title,axis labelsetc.
4.1 FunnelPlotR methods: Basic plot
The code below plots a funnel plot.
funnel_plot(
.data = covid19,
numerator = Death,
denominator = Positive,
group = `Sub-district`
)
A funnel plot object with 267 points of which 1 are outliers.
Plot is adjusted for overdispersion.
groupin this function is different from the scatterplot. Here is defines the level of the points to be plotted, ie., Sub-district, District or City. If City is chosen, there are only six data points.- By default,
data_typeargument is “SR”. limit: Plot limits, accepted values are 95 or 99, corresponding to 95% or 99.8% quantiles of the distribution.
4.2 FunnelPlotR methods: Makeover 1
Adjust code to make over the previous plot.
Show the code
funnel_plot(
.data = covid19,
numerator = Death,
denominator = Positive,
group = `Sub-district`,
data_type = "PR",
xrange = c(0, 6500),
yrange = c(0, 0.05)
)
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
- +
data_typeargument is used to change from default “SR” to “PR” (i.e., proportions). - +
xrangeandyrangeare used to set the range of x-axis and y-axis.
4.3 FunnelPlotR methods: Makeover 2
Makeover 2 is to add titles for x axis and y axis.
Show the code
funnel_plot(
.data = covid19,
numerator = Death,
denominator = Positive,
group = `Sub-district`,
data_type = "PR",
xrange = c(0, 6500),
yrange = c(0, 0.05),
label = NA,
title = "Cumulative COVID-19 Fatality Rate by \nCumulative Total Number of Postive Cases",
x_label = "Cumulative COVID-19 Positive Cases",
y_label = "Cumulative Fatality Rate",
)
A funnel plot object with 267 points of which 7 are outliers.
Plot is adjusted for overdispersion.
label = NAargument is to remove the default label outliers feature.titleargument is used to add plot title.x_labelandy_labelarguments are used to add/edit x-axis and y-axis titles.
5 Funnel Plot for Fair Visual Comparison: ggplot2 methods
In this exercise, we will build funnel plots step-by-step with ggplot2. This will enhance the skills using ggplot2 to customise specialised data visualisation like Funnel Plot.
5.1 Computing the basic derived fields
To plot the funnel plot from scratch:
- drive cumulative death rate
- standard error of cumulative death rate
df <- covid19 %>%
mutate(rate = Death/Positive) %>%
mutate(rate.se = sqrt((rate*(1-rate)) / (Positive))) %>%
filter(rate > 0)- fit.mean is computed by using the code below:
fit.mean <- weighted.mean(df$rate, 1/df$rate.se^2)5.2 Calculate lower and upper limits for 95% and 99.9% CI
Next, we will compute the lower and upper limits for 95% Confidence Interval.
Show the code
number.seq <- seq(1, max(df$Positive), 1)
number.ll95 <- fit.mean - 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul95 <- fit.mean + 1.96 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ll999 <- fit.mean - 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
number.ul999 <- fit.mean + 3.29 * sqrt((fit.mean*(1-fit.mean)) / (number.seq))
dfCI <- data.frame(number.ll95, number.ul95,
number.ll999, number.ul999,
number.seq, fit.mean)95% of the data falls within 1.96 standard deviations of the mean.
99.9% of the data falls within 3.29 standard deviations of the mean.
5.3 Plot a static funnel plot
Use the following code to plot a static funnel plot with ggplot2.
Show the code
p <- ggplot(df, aes(x = Positive, y = rate)) +
geom_point(aes(label = `Sub-district`),
alpha = 0.4) +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll95),
size = 0.4,
colour = "skyblue",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul95),
size = 0.4,
colour = "skyblue",
linetype = "dashed") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ll999),
size = 0.4,
colour = "skyblue") +
geom_line(data = dfCI,
aes(x = number.seq,
y = number.ul999),
size = 0.4,
colour = 'skyblue') +
geom_hline(data = dfCI,
aes(yintercept = fit.mean),
size = 0.4,
colour = "grey40") +
coord_cartesian(ylim=c(0, 0.05)) +
annotate("text", x = 1, y = -0.13, label = "95%", size = 3, colour = "grey40") +
annotate("text", x = 4.5, y = -0.18, label = "99%", size = 3, colour = "grey40") +
ggtitle("Cumulative Fatality Rate by Cumulative Number of COVID-19 Cases") +
xlab("Cumulative Number of COVID-19 Cases") +
ylab("Cumulative Fatality Rate") +
theme_light() +
theme(plot.title = element_text(size = 12),
legend.position = c(0.91, 0.85),
legend.title = element_text(size = 7),
legend.text = element_text(size = 7),
legend.background = element_rect(colour = "grey60", linetype = "dotted"),
legend.key.height = unit(0.3, "cm"))
p
5.4 Interactive funnel plot: plotly + ggplot2
to make the funnel plot interactive, we can use ggplot2 together with ggplotly() from plotly R package.
Show the code
fp_ggplotly <- ggplotly(p,
tooltip = c("label",
"x",
"y"))
fp_ggplotly