P-value functions: Tutorial using the pvaluefunctions package

Denis Infanger

2021-11-30

Accompanying paper

We published an accompanying paper to illustrate the use of p-value functions:

Infanger D, Schmidt-Trucksäss A. (2019): P value functions: An underused method to present research results and to promote quantitative reasoning. Statistics in Medicine, 38: 4189-4197. doi: 10.1002/sim.8293.

Recreation of the graphics in the paper

The code and instructions to reproduce all graphics in our paper can be found in the following GitHub repository: https://github.com/DInfanger/pvalue_functions

Overview

This vignette shows how to use the pvaluefunctions package wich contains an R function to create graphics of p-value functions, confidence distributions, confidence densities, or the Surprisal value (S-value) (Greenland 2019).

Installation

Install (install.packages("pvaluefunctions")) and load the package from CRAN.

library(pvaluefunctions)

Dependencies

The function depends on the following R packages:

Usage

There is only one function needed to create the plots: conf_dist(). The function has the following arguments:

Required arguments for different estimate types

Returned values

The main function conf_dist() returns five objects in a list:

Examples

Two-sample t-test with unequal variances (Welch-Test)

#-----------------------------------------------------------------------------
# T-Test
#-----------------------------------------------------------------------------

with(sleep, mean(extra[group == 1])) - with(sleep, mean(extra[group == 2]))
#> [1] -1.58
t.test(extra ~ group, data = sleep, var.equal = FALSE)
#> 
#>  Welch Two Sample t-test
#> 
#> data:  extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
#> 95 percent confidence interval:
#>  -3.3654832  0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2 
#>            0.75            2.33

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
  estimate = c(-1.58)
  , df = c(17.77647)
  , tstat = c(-1.860813)
  , type = "ttest"
  , plot_type = "p_val"
  , n_values = 1e4L
  # , est_names = c("")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Mean difference (group 1 - group 2)"
  , together = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
  , x_scale = "line"
  , plot = TRUE
)

Single coefficient from a linear regression model

P-value function

Because it’s difficult to see very small p-values in the graph, you can set the option log_yaxis = TRUE so that p-values (i.e. the y-axes) below the value set in cut_logyaxis will be plotted on a logarithmic scale. This will make it much easier to see small p-values but has the disadvantage of creating a “kink” in the p-value function which is a pure artifact and puts an undue emphasis on the specified cutoff.

#-----------------------------------------------------------------------------
# Model
#-----------------------------------------------------------------------------

mod <- lm(Infant.Mortality~Agriculture + Fertility + Examination, data = swiss)

summary(mod)
#> 
#> Call:
#> lm(formula = Infant.Mortality ~ Agriculture + Fertility + Examination, 
#>     data = swiss)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -8.5375 -1.4021 -0.0066  1.7381  5.9150 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept) 11.01896    4.47291   2.463  0.01784 * 
#> Agriculture -0.02143    0.02394  -0.895  0.37569   
#> Fertility    0.13115    0.04145   3.164  0.00285 **
#> Examination  0.04913    0.08351   0.588  0.55942   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.645 on 43 degrees of freedom
#> Multiple R-squared:  0.2291, Adjusted R-squared:  0.1753 
#> F-statistic:  4.26 on 3 and 43 DF,  p-value: 0.01014

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
  estimate = c(-0.02143)
  , df = c(43)
  , stderr = (0.02394)
  , type = "linreg"
  , plot_type = "p_val"
  , n_values = 1e4L
  # , est_names = c("")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  , log_yaxis = TRUE
  , cut_logyaxis = 0.05
  , xlab = "Coefficient Agriculture"
  , xlim = c(-0.12, 0.065)
  , together = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = FALSE
  , plot = TRUE
)

Confidence distribution

res <- conf_dist(
  estimate = c(-0.02143)
  , df = c(43)
  , stderr = (0.02394)
  , type = "linreg"
  , plot_type = "cdf"
  , n_values = 1e4L
  # , est_names = c("")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  # , log_yaxis = TRUE
  # , cut_logyaxis = 0.05
  , xlab = "Coefficient Agriculture"
  , xlim = c(-0.12, 0.065)
  , together = FALSE
  , col = "#08A9CF"
  # , plot_p_limit = 1 - 0.999
  , plot_counternull = FALSE
)

The point where the confidence distribution is \(0.5\) is the median unbiased estimator (see Xie & Singh (2013) for a review and proofs).

Multiple coefficients from a linear regression model

P-value functions

res <- conf_dist(
  estimate = c(0.13115, 0.04913)
  , df = c(43, 43)
  , stderr = c(0.04145, 0.08351)
  , type = "linreg"
  , plot_type = "p_val"
  , n_values = 1e4L
  , est_names = c("Fertility", "Examination")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Regression coefficients"
  , together = TRUE
  , same_color = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = FALSE
  , inverted = FALSE
)

Surprisal values

res <- conf_dist(
  estimate = c(0.13115, 0.04913)
  , df = c(43, 43)
  , stderr = c(0.04145, 0.08351)
  , type = "linreg"
  , plot_type = "s_val"
  , n_values = 1e4L
  , est_names = c("Fertility", "Examination")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  # , log_yaxis = TRUE
  # , cut_logyaxis = 0.05
  , xlab = "Regression coefficients"
  , together = TRUE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
)

Pearson correlation coefficient (one-sided)

As of version 1.6.2, the package uses the exact distribution for Pearson’s correlation coefficient as described in a preprint by Gunnar Taraldsen (2020). The calculations make use of the Gaussian hypergeometric function (\(_2F_1\)). It also makes the calculations slower, depending on the value in n_values.

#-----------------------------------------------------------------------------
# Calculate Pearson's correlation coefficient
#-----------------------------------------------------------------------------

cor.test(swiss$Fertility, swiss$Agriculture, alternative = "two.sided", method = "pearson")
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  swiss$Fertility and swiss$Agriculture
#> t = 2.5316, df = 45, p-value = 0.01492
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.07334947 0.58130587
#> sample estimates:
#>       cor 
#> 0.3530792

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
  estimate = c(0.3530792)
  , n = 47
  , type = "pearson"
  , plot_type = "p_val"
  , n_values = 1e3L
  # , est_names = c("")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "one_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Pearson correlation"
  , together = TRUE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = FALSE
)

Odds ratio from logistic regression

#-----------------------------------------------------------------------------
# Create p-value function
#-----------------------------------------------------------------------------

res <- conf_dist(
  estimate = c(0.804037549)
  , stderr = c(0.331819298)
  , type = "logreg"
  , plot_type = "p_val"
  , n_values = 1e4L
  , est_names = c("GPA")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(log(1)) # null value on the log-odds scale
  , trans = "exp"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Odds Ratio (GPA)"
  , xlim = log(c(0.7, 5.2)) # axis limits on the log-odds scale
  , together = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
  , x_scale = "default"
)

Proportion

The p-value function (and thus the confidence intervals) are based on Wilson’s score interval and not the normal approximation. This means that the p-value function will never be outside the interval \([0, 1]\).

res <- conf_dist(
  estimate = c(0.44)
  , n = c(50)
  , type = "prop"
  , plot_type = "p_val"
  , n_values = 1e4L
  # , est_names = c("")
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0.5)
  , trans = "exp"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Proportion"
  # , xlim = c(0.25, 0.65)
  , together = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
  , x_scale = "default"
)
#> 
#> Transformation changed to identity.

Difference between two independent proportions: Wilson’s score by Newcombe with continuity correction

res <- conf_dist(
  estimate = c(68/100, 98/150)
  , n = c(100, 150)
  , type = "propdiff"
  , plot_type = "p_val"
  , n_values = 1e4L
  , conf_level = c(0.95, 0.90, 0.80)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Difference between proportions"
  , together = FALSE
  , plot_p_limit = 1 - 0.9999
  , plot_counternull = FALSE
)

Difference between two independent proportions: Agresti-Caffo adjusted Wald interval

The standard Wald interval can be modified in a simple manner to drastically improve its coverage probabilities. Simply add 1 to the number of successes and add 2 to the sample size for both proportions. Then proceed to calculate the Wald interval with these modified data. The point estimate for the difference between proportions is still calculated using the unmodified data. The function conf_dist does not have a dedicaded type for this kind of estimator but as the Wald interval is based on the normal distribution, we can use type = general_z to create the p-value function.


# First proportion

x1 <- 8
n1 <- 40

# Second proportion

x2 <- 11
n2 <- 30

# Apply the correction 

p1hat <- (x1 + 1)/(n1 + 2)
p2hat <- (x2 + 1)/(n2 + 2)

# The estimator (unmodified)

est0 <- (x1/n1) - (x2/n2)

# The modified estimator and its standard error using the correction

est <- p1hat - p2hat
se <- sqrt(((p1hat*(1 - p1hat))/(n1 + 2)) + ((p2hat*(1 - p2hat))/(n2 + 2)))

res <- conf_dist(
  estimate = c(est)
  , stderr = c(se)
  , type = "general_z"
  , plot_type = "p_val"
  , n_values = 1e4L
  # , est_names = c("Estimate")
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , conf_level = c(0.95, 0.99)
  , null_values = c(0)
  , trans = "identity"
  , alternative = "two_sided"
  , xlab = "Difference of proportions"
  # , xlim = c(-0.75, 0.5)
  , together = FALSE
  , plot_p_limit = 1 - 0.9999
  , plot_counternull = FALSE
)

Confidence density of a variance estimate from a normal distribution

The confidence density of a variance estimate is skewed. This means that the mean, mode and median of the confidence density will not be identical, in general.

# Simulate some data from a normal distribution

set.seed(142857)
var_est <- var(x <- rnorm(20, 100, 15))

res <- conf_dist(
  estimate = var_est
  , n = length(x)
  , type = "var"
  , plot_type = "pdf"
  , n_values = 1e4L
  , est_names = c("Variance")
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , conf_level = c(0.95)
  # , null_values = c(15^2, 18^2)
  , trans = "identity"
  , alternative = "two_sided"
  , xlab = "Variance"
  , xlim = c(100, 900)
  , together = TRUE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
)
# Add vertical lines at the point estimates (mode, median, mean)

res$plot + ggplot2::geom_vline(xintercept = as.numeric(res$point_est[1, 1:3]), linetype = 2, size = 1)

P-value function and confidence distribution for the relative survival effect (1 - HR%)

Here, I’m going to replicate Figure 1 and Figure 2 in Bender et al. (2005). To do this, we first need to define the transformation that transforms the log-HR into the relative survival effect. By specifying trans = rse_fun, the values are automatically transformed and plotted. In addition, we will plot the p-value function inverted so that the cusp is located at the bottom. First the p-value function:

# Define the transformation function and its inverse for the relative survival effect

rse_fun <- function(x){
  100*(1 - exp(x))
}

rse_fun_inv <- function(x){
  log(1 - (x/100))
}

res <- conf_dist(
  estimate = c(log(0.72))
  , stderr = (0.187618)
  , type = "coxreg"
  , plot_type = "p_val"
  , n_values = 1e4L
  , est_names = c("RSE")
  , conf_level = c(0.95, 0.8, 0.5)
  , null_values = rse_fun_inv(c(0))
  , trans = "rse_fun"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Relative survival effect (1 - HR%)"
  , xlim = rse_fun_inv(c(-30, 60))
  , together = FALSE
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
  , inverted = TRUE
  , title = "Figure 1 in Bender et al. (2005)"
  , x_scale = "default"
)


rm(rse_fun, rse_fun_inv)

Now the confidence distribution (Figure 2):

# Define the transformation function and its inverse for the relative survival effect

rse_fun <- function(x){
  100*(1 - exp(-x))
}

rse_fun_inv <- function(x){
  log(-(100)/(x - 100))
}

res <- conf_dist(
  estimate = c(-log(0.72))
  , stderr = (0.187618)
  , type = "coxreg"
  , plot_type = "cdf"
  , n_values = 1e4L
  , est_names = c("RSE")
  , conf_level = c(0.95, 0.883, 0.5)
  , null_values = rse_fun_inv(c(0))
  , trans = "rse_fun"
  , alternative = "two_sided"
  , log_yaxis = FALSE
  , cut_logyaxis = 0.05
  , xlab = "Relative survival effect (1 - HR%)"
  , together = FALSE
  , xlim = rse_fun_inv(c(-3, 60))
  , plot_p_limit = 1 - 0.999
  , plot_counternull = TRUE
  , inverted = TRUE
  , title = "Figure 2 in Bender et al. (2005)"
)


rm(rse_fun, rse_fun_inv)

Comparing the precision of multiple estimates using the AUCC (area under the confidence curve)

The AUCC is a scalar representing the area under the p-value function (see Berrar (2017)). Values close to 0 indicate a high precision while larger values indicate poorer precision. Multiple estimates can be compared by their AUCC (in the context of a meta-analysis for example). The AUCC is calculated numerically using trapezoidal integration on the untransformed p-value function. The estimate can be quite poor if few x-values are used (i.e. argument n_values is low). If there are null values provided, the function will also return the proportion of the AUCC that lies above those nulls.


# Lungcancer dataset from the "meta" package

res <- conf_dist(
  estimate = c(2.512, 2.298, 2.455, 2.255, 2.989, 1.59, 2.674) # Log-incidence rate ratio (IRR)
  , stderr = c(0.087, 0.127, 0.144, 0.153, 0.265, 0.318, 0.584) # Standard errors of the log-IRR
  , type = "general_z"
  , plot_type = "p_val"
  , n_values = 1e4L
  , est_names = c("U.S. Veterans", "Men in 9 States", "Canadian Veterans", "Men in 25 States", "British Doctors", "California Legion", "California Occupational") # Study names
  , conf_level = c(0.95)
  , null_values = c(log(1))
  , trans = "exp"
  , alternative = "two_sided"
  , xlab = "Incidence rate ratio (IRR)"
  , xlim = c(log(0.95), log(50))
  , together = TRUE
  , same_color = TRUE
  , col = "#C977A2"
  , plot_p_limit = 1 - 0.9999
  , plot_counternull = FALSE
  , inverted = FALSE
)


# Print the AUCCs
print(res$aucc_frame, row.names = FALSE)
#>                 variable      aucc null p_above_null
#>            U.S. Veterans 0.1388319    0    1.0000000
#>          Men in 9 States 0.2026627    0    1.0000000
#>        Canadian Veterans 0.2297908    0    1.0000000
#>         Men in 25 States 0.2441527    0    1.0000000
#>          British Doctors 0.4228789    0    1.0000000
#>        California Legion 0.5074546    0    0.9999999
#>  California Occupational 0.9319293    0    0.9999994

Similar packages

Check out the concurve package that offers a similar functionality.

References

Bender R, Berg G, Zeeb H. (2005): Tutorial: using confidence curves in medical research. Biom J. 47(2): 237-47.

Berrar D (2017): Confidence Curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach Learn 106:911-949.

Fraser D. A. S. (2019): The p-value function and statistical inference. Am Stat, 73:sup1, 135-147.

Greenland S (2019): Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution with S-Values. Am Stat, 73sup1, 106-114.

Infanger D, Schmidt-Trucksäss A. (2019): P value functions: An underused method to present research results and to promote quantitative reasoning. Stat Med, 38, 4189-4197. doi: 10.1002/sim.8293.

Poole C. (1987a): Beyond the confidence interval. Am J Public Health. 77(2): 195-9.

Poole C. (1987b) Confidence intervals exclude nothing. Am J Public Health. 77(4): 492-3.

Rosenthal R, Rubin DB. (1994): The counternull value of an effect size: A new statistic. Psychol Sci. 5(6): 329-34.

Schweder T, Hjort NL. (2016): Confidence, likelihood, probability: statistical inference with confidence distributions. New York, NY: Cambridge University Press.

Xie M, Singh K, Strawderman WE. (2011): Confidence Distributions and a Unifying Framework for Meta-Analysis. J Am Stat Assoc 106(493): 320-33. doi: 10.1198/jasa.2011.tm09803.

Xie Mg, Singh K. (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A review. Internat Statist Rev. 81(1): 3-39.

Contact

Denis Infanger

Session info

#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                        LC_CTYPE=German_Switzerland.1252   
#> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
#> [5] LC_TIME=German_Switzerland.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] pvaluefunctions_1.6.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] pracma_2.3.3       RColorBrewer_1.1-2 highr_0.9          bslib_0.3.1       
#>  [5] compiler_4.1.2     pillar_1.6.4       jquerylib_0.1.4    tools_4.1.2       
#>  [9] digest_0.6.28      jsonlite_1.7.2     evaluate_0.14      lifecycle_1.0.1   
#> [13] tibble_3.1.6       gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.12      
#> [17] DBI_1.1.1          parallel_4.1.2     yaml_2.2.1         xfun_0.28         
#> [21] fastmap_1.1.0      stringr_1.4.0      dplyr_1.0.7        knitr_1.36        
#> [25] generics_0.1.1     sass_0.4.0         vctrs_0.3.8        grid_4.1.2        
#> [29] tidyselect_1.1.1   glue_1.5.0         R6_2.5.1           fansi_0.5.0       
#> [33] rmarkdown_2.11     zipfR_0.6-70       farver_2.1.0       ggplot2_3.3.5     
#> [37] purrr_0.3.4        magrittr_2.0.1     scales_1.1.1       htmltools_0.5.2   
#> [41] ellipsis_0.3.2     assertthat_0.2.1   colorspace_2.0-2   utf8_1.2.2        
#> [45] stringi_1.7.6      gsl_2.1-7.1        munsell_0.5.0      crayon_1.4.2