A user’s guide to random measurement error correction in a covariate for R

Linda Nab

2021-12-01

Introduction

mecor is an R package for Measurement Error CORrection. mecor implements measurement error correction methods for linear models with continuous outcomes. The measurement error can either occur in a continuous covariate or in the continuous outcome. This vignette discusses how a sensitivity analysis for random covariate measurement error is conducted in mecor.

Regression calibration is one of the most popular measurement error correction methods for covariate measurement error. This vignette shows how regression calibration is used in mecor to correct for random measurement error in a covariate. Our interest lies in estimating the association between a continuous reference exposure \(X\) and a continuous outcome \(Y\), given covariates \(Z\). Instead of \(X\), the substitute error-prone exposure \(X^*\) is measured, assumed with random measurement error. It is further assumed that there is no extra information available to quantify the random measurement error in \(X^*\). The input for our measurement error correction therefore is constrained to informed guesses about the size of the random measurement error. Literature or expert knowledge could be used to inform these guesses. We refer to the vignettes discussing e.g. standard regression calibration for random measurement error correction when validation data is available.

Random measurement error

We assume that \(X^*\) is measured with random measurement error. This means that we assume that \(X^* = X + U\), where \(U\) has mean 0 and variance \(\tau^2\). More specifically, we assume non-differential random measurement error, i.e. \(X^*|X\) is independent of \(Y\) (our outcome).

Random measurement error correction in mecor

The object MeasErrorRandom() in mecor is used for random measurement error correction in a covariate. We explain the usage of the MeasErrorRandom() object in the following. We first introduce the simulated data set vat. The simulated data set vat is an internal covariate-validation study. We will use this data set to explore random measurement error correction without using the reference measure visceral adipose tissue \(VAT\) that is available in the data set. The data set vat contains 1000 observations of the outcome insulin resistance \(IR_{ln}\), the error-prone exposure waist circumference \(WC\) and the covariate age \(age\). The reference exposure \(VAT\) is observed in approximately 25% of the individuals in the study, but will be ignored. In this example, we assume that there is random measurement error in the exposure \(WC\).

# load internal covariate validation study
data("vat", package = "mecor")
head(vat)
#>         ir_ln         wc sex age        tbf       vat
#> 1 -0.09341837 -1.3136816   1  48 -0.6571345        NA
#> 2  0.16820894 -2.0336624   0  54 -1.5882163        NA
#> 3  0.57299976 -0.2611214   0  46 -1.1033709        NA
#> 4  0.63677178  0.8631987   0  55 -1.4785869 0.5083247
#> 5  0.92908882 -1.2054861   1  61  0.9020136        NA
#> 6 -0.72410039 -2.5032852   1  47 -0.9584166        NA

When ignoring the measurement error in \(WC\), one would naively regress \(WC\) and \(age\) on \(IR_{ln}\). This results in a biased estimation of the exposure-outcome association:

# naive estimate of the exposure-outcome association
data(vat)
lm(ir_ln ~ wc + age, data = vat)
#> 
#> Call:
#> lm(formula = ir_ln ~ wc + age, data = vat)
#> 
#> Coefficients:
#> (Intercept)           wc          age  
#>    -0.13015      0.25627      0.01211

Suppose that \(VAT\) is not observed in the internal covariate-validation study vat. To correct the bias in the naive association between exposure \(WC\) and outcome \(IR_{ln}\) given \(age\), we need to make an informed guess about the quantity of \(\tau^2\). Suppose we assume \(\tau^2 = 0.25\). One can proceed as follows using mecor():

# Use MeasErrorRandom for measurement error correction:
mecor(ir_ln ~ MeasErrorRandom(substitute = wc, variance = 0.25) + age,
      data = vat)
#> 
#> Call:
#> mecor(formula = ir_ln ~ MeasErrorRandom(substitute = wc, variance = 0.25) + 
#>     age, data = vat)
#> 
#> Coefficients Corrected Model:
#> (Intercept)      cor_wc         age 
#> -0.02993932  0.32125598  0.01116984 
#> 
#> Coefficients Uncorrected Model:
#> (Intercept)          wc         age 
#> -0.13014970  0.25626815  0.01211408

How does mecor do this?

To correct for the random measurement error in \(WC\), mecor constructs the calibration model matrix as follows:

# First, construct the variance--covariance matrix of X_star and Z: 
# ( Var(X_star)   Cov(X_star, Z)
#   Cov(Z,X_star) Var(Z)       )
# To do so, we design Q, a matrix with 1000 rows (number of observations) and 2 
# columns. The first column of Q contains all 1000 observations of X_star, each 
# minus the mean of X_star. The second column of Q contains all 1000 obervations 
# of Z, each minus the mean of Z. 
Q <- scale(cbind(vat$wc, vat$age), scale = F)
# Subsequently, the variance--covariance matrix of X_star and Z is constructed:
matrix <- t(Q) %*% Q / (length(vat$age) - 1)
# Then, the variance--covariance matrix of X and Z is constructed, by using:
# Var(X) = Var(X_star) - Var(U) <--- Var(U) is the assumed tau^2
# Cov(X, Z) = Cov(X_star, Z)    <--- since U is assumed independent of Z
matrix1 <- matrix
matrix1[1, 1] <- matrix1[1, 1] - 0.25 # tau^2 = 0.25
# Rosner et al. (1992) show that the calibration model matrix can be constructed
# by taking the inverse of the variance--covariance matrix of X and Z and by
# matrix multiplying that matrix with the variance--covariance matrix of X_star
# and Z. 
model_matrix <- solve(matrix1) %*% matrix
model_matrix
#>              [,1]          [,2]
#> [1,]  1.253593085 -5.551115e-17
#> [2,] -0.003684601  1.000000e+00
matrix1 %*% solve(matrix)
#>               [,1]        [,2]
#> [1,]  7.977070e-01 0.002939232
#> [2,] -5.551115e-17 1.000000000
# The resulting matrix is now:
# (1/lambda1        0
#  -lambda2/lambda1 1)
# Where,
# lambda1 = Cov(X,X_star|Z) / Var(X_star|Z)
# lambda2 = Cov(X,Z|X_star) / Var(Z|X_star) 
# Or, more familiar, the calibration model,
# E[X|X_star, Z] = lambda0 + lambda1 * X_star + lambda2 * Z
lambda1 <- 1 / model_matrix[1, 1]
lambda2 <- model_matrix[2,1] * - lambda1
# From standard theory, we have,
# lambda0 = mean(X) - lambda1 * mean(X_star) - lambda2 * mean(Z)
# mean(X) = mean(X_star) since we assume random measurement error
lambda0 <- mean(vat$wc) - lambda1 * mean(vat$wc) - lambda2 * mean(vat$age)
# The calibration model matrix Lambda is defined as:
# (lambda1 lambda0 lambda2
#  0       1       0
#  0       0       1)
model_matrix <- diag(3)
model_matrix[1, 1:3] <- c(lambda1, lambda0, lambda2)
model_matrix
#>          [,1]       [,2]        [,3]
#> [1,] 0.797707 -0.3119331 0.002939232
#> [2,] 0.000000  1.0000000 0.000000000
#> [3,] 0.000000  0.0000000 1.000000000
# The calibration model matrix is standard output of mecor, and can be found
# using:
mecor_fit <- mecor(ir_ln ~ MeasErrorRandom(wc, 0.25) + age,
                   data = vat)
mecor_fit$corfit$matrix
#>          Lambda1    Lambda0     Lambda3
#> Lambda1 0.797707 -0.3119331 0.002939232
#> Lambda0 0.000000  1.0000000 0.000000000
#> Lambda3 0.000000  0.0000000 1.000000000

Subsequently, the naive estimates of the outcome model are multiplied by the inverse of the calibration model matrix to obtain corrected estimates of the outcome model.

# Fit naive outcome model
naive_fit <- lm(ir_ln ~ wc + age, 
                data = vat)
# Save coefficients
beta_star <- naive_fit$coefficients
# To prepare the coefficients for the measurement error correction, exchange the
# intercept and the coefficient for X_star
beta_star[1:2] <- rev(beta_star[1:2]) 
# Perform the measurement error correction:
beta <- beta_star %*% solve(model_matrix)
# Reverse the order 
beta[1:2] <- rev(beta[1:2])
beta # corrected coefficients of the outcome model
#>             [,1]     [,2]       [,3]
#> [1,] -0.02993932 0.321256 0.01116984

Which exactly matches the output of mecor() above.