isni

The `sos` example

sos is dataset on a cross-sectional survey of sexual practices among students at the University of Edinburgh. The response variable is the students’ answer to the question ``Have you ever had sexual intercourse?’’. Because of the sensitivity of this question, many students declined to answer, leading to substantial missing data. We consider a simplified data set consisting of the answer to this question, with the student’s sex and faculty as predictors.

library(isni)
data(sos)
sos[sample(nrow(sos),10),]

##      sexact gender faculty
## 4578     no female   other
## 1748   <NA>   male   other
## 5041   <NA> female   other
## 5371   <NA> female   other
## 2028   <NA>   male   other
## 1464     no   male   other
## 885     yes   male   other
## 2476   <NA>   male   other
## 5086   <NA> female   other
## 5350   <NA> female   other

The R code above loads the library isni and the data frame sos, displaying a random subsample of \(10\) records. sos includes the following factor variables: sexact is the response to the question Have you ever had sexual intercourse? (two levels: no (reference level), yes); gender is the student’s sex (two levels: male (reference level), female); faculty is the student’s faculty (medical/dental/veterinary, all other faculty categories (reference level)).

Assuming ignorable nonresponse, one can fit a logistic model (using responders only) to predict the outcome by sex, faculty and their interaction. We estimated the model with function :

ymodel= sexact  ~ gender*faculty
summary(glm(ymodel,family=binomial, data=sos))

## 
## Call:
## glm(formula = ymodel, family = binomial, data = sos)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6713  -1.3282   0.7540   0.7642   1.0338  
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.08153    0.05561  19.448  < 2e-16 ***
## genderfemale             0.03081    0.07958   0.387    0.699    
## facultymdv              -0.73389    0.14921  -4.918 8.73e-07 ***
## genderfemale:facultymdv  0.10213    0.20670   0.494    0.621    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4450.3  on 3827  degrees of freedom
## Residual deviance: 4408.2  on 3824  degrees of freedom
##   (2308 observations deleted due to missingness)
## AIC: 4416.2
## 
## Number of Fisher Scoring iterations: 4

The estimates show that students in a medical faculty were less likely to report having had sexual intercourse. Because only 62.4% responded to the sexual practice question, there is concern that this analysis is sensitive to the assumption of ignorability. For this purpose one can conduct an ISNI analysis for this model with the function isniglm(). We posit a nonignorable nonresponse model in the following form \[\begin{eqnarray} logit (Prob(is.na(sexact)=``yes''))=\gamma_{0}^T s +\gamma_1*sexact \end{eqnarray}\] where the observed missingness predictor s including gender, faculty and their interaction. In the above nonresponse model, the probability of nonresponse to the sexual practice question is associated with the observed missingness predictor s via the parameter \(\gamma_0\) and is associated with the partially missing outcome sexact via the parameter \(\gamma_1\). The nonignorable parameter \(\gamma_1\) captures the mangnitude and nature of nonignrable missingness. When \(\gamma_1=0\), the nonresponse becomes ignorable in the sense that the probability of missingness is indepdent of unobserved values of sexact. The above MAR analysis provides consistent and valid estimates. When \(\gamma_1\) departs from zero, the nonresponse becomes nonignorable and the above MAR estimates are subject to selection bias due to nonignorable nonresponse. The ISNI functions (specifically the isniglm function for this example) can be applied to evaluate the rate of change of model estimates in the neighborhood of the MAR model where the missingness probability is allowed to depend on the unobserved value of sexact, even after conditioning on the other missingness predictors in s.

A simple ISNI analysis can be conducted using the isniglm function as follows:

sos.isni<-isniglm(ymodel, family=binomial, data=sos)

## # weights:  5 (4 variable)
## initial  value 4253.151100 
## final  value 4027.954580 
## converged

sos.isni

## 
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
## 
## ISNIs:
##             (Intercept)             genderfemale               facultymdv  
##                0.410141                -0.038983                -0.169859  
## genderfemale:facultymdv  
##                0.027542  
## 
## c statistics:
##             (Intercept)             genderfemale               facultymdv  
##                 0.13559                  2.04146                  0.87846  
## genderfemale:facultymdv  
##                 7.50482  
## 
## Residual Deviance of the MAR model: 4408.2
## 
## AIC of the MAR model: 4416.2

The summary function in the package expresses the isniglm() object:

 summary(sos.isni)

## 
## Call:
## isniglm(formula = ymodel, family = binomial, data = sos)
## 
##                          MAR Est.  Std. Err      ISNI      c
## (Intercept)              1.081531  0.055611  0.410141 0.1356
## genderfemale             0.030808  0.079583 -0.038983 2.0415
## facultymdv              -0.733886  0.149215 -0.169859 0.8785
## genderfemale:facultymdv  0.102133  0.206696  0.027542 7.5048

The columns MAR Est. and Std. Err denote the logistic model estimates and their standard errors under MAR; ISNI and c denote ISNI values and c statistics. Recall that ISNI denotes the approximate change in the MLEs when \(\gamma_1\) in the selection model is changed from \(0\) to \(1\). Under our nonignorable selection model, assuming that \(\gamma_1=1\) means that a student whose answer is yes has an increase of 2.7-fold in the odds of nonresponse. Thus, subjects whose true value is yes would be more likely to have a missing value, and the naive MAR estimate for (Intercept) should be less than the (Intercept) estimate under the correct nonignorable model. The positive sign of the ISNI value for (Intercept) is consistent with this prediction. The ISNI for the faculty predictor is \(-0.17\), indicating that if, as is more plausible here, \(\gamma_1 = 1\), the MLE for the estimate should change from \(-0.73\) to \(-0.90\). If \(\gamma_1 = -1\), the estimate would change from \(-0.73\) to \(-0.56\).

The column c presents the c statistics that approximate the minimum magnitude of nonignorability that is needed for the change in an MLE to equal one standard error (\(\text{SE}\)). One can then assess sensitivity by evaluating whether this level of nonignorability is plausible. For our sos example with a binary outcome, the \(c\) statistic is defined as \[\begin{eqnarray} c= \left| \frac{\text{SE} }{\text{ISNI}}\right|. \end{eqnarray}\] The \(c\) statistic here informs us that in order for selection bias to be as large as the sampling error, the magnitude of nonignorability needs to be at least as large as that with which one-unit change in sexact is associated with an odds ratio of 2.7 in the probability of being missing.

When \(c\) is large, only extreme nonignorability can make the estimate change substantially, and consequently sensitivity to nonignorability is of little concern. For example, \(c=10\) implies that in order for the error in an MAR estimate to be the same size as its sampling error, the nonignorability needs to be strong enough that a \(0.1\)-unit change in sexact causes a significant change in the odds of being missing. When \(c\) is small, modest departure from MAR can cause the estimate to change substantially. For example, \(c=0.1\) implies that when even a \(10\)-unit change in sexact causes a significant change in the odds of being missing, the estimate may change substantially. As such a degree of nonignorability is plausible in many applications, this small \(c\) value signals sensitivity. Prior research suggests \(c<1\) as a rule of thumb to signal significant sensitivity.

In the sos example, the \(c\) statistics for (Intercept)} and faculty are both less than \(1\), suggesting that these coefficients are sensitive to nonignorability, confirming previous findings. Prior research also found that neither the gender nor the interaction term between gender and faculty should be sensitive, as our findings using ISNI confirm.

Two-equation model specification

In the above we do not explicitly specify an missing data mechanism model (MDM) via formula argument in the isniglm function. The same analysis can be replicated by explicitly specifying an MDM model using the code below. The two-equation formula below sexact | is.na(sexact) ~ gender*faculty | gender *faculty uses the operator | to separately specify variables used in the complete-data model and MDM. The two-equation formula means that the complete-data model is sexact \(\sim\) gender*faculty and that is.na(sexact) and gender*faculty are the missingness indicator and the missingness predictor \(s\) in the nonresponse model described above, respectively.

ygmodel <- sexact | is.na(sexact)  ~ gender*faculty | gender *faculty
summary(isniglm(ygmodel, family=binomial, data=sos))

## # weights:  5 (4 variable)
## initial  value 4253.151100 
## final  value 4027.954580 
## converged
## 
## Call:
## isniglm(formula = ygmodel, family = binomial, data = sos)
## 
##                          MAR Est.  Std. Err      ISNI      c
## (Intercept)              1.081531  0.055611  0.410141 0.1356
## genderfemale             0.030808  0.079583 -0.038983 2.0415
## facultymdv              -0.733886  0.149215 -0.169859 0.8785
## genderfemale:facultymdv  0.102133  0.206696  0.027542 7.5048

ISNI Analysis for Grouped Binomial Outcome

Because all the covariates in are categorical variables, one can also analyze the data as a grouped binomial outcome using the weight argument as below.

 gender <- c(0,0,1,1,0,0,1,1)
 faculty <- c(0,0,0,0,1,1,1,1)
gender <- factor(gender, levels = c(0, 1), labels =c("male", "female"))
faculty <- factor(faculty, levels = c(0, 1), labels =c("other", "mdv"))
 SAcount <- c(NA, 1277, NA, 1247, NA, 126, NA, 152)
 total  <- c(1189,1710,978,1657,68,215,73,246)
sosgrp <- data.frame(gender=gender, faculty=faculty, SAcount=SAcount, total=total)
ymodel <- SAcount/total ~gender*faculty
 sosgrp.isni<-isniglm(ymodel, family=binomial, data=sosgrp, weight=total)

## # weights:  5 (4 variable)
## initial  value 4253.151100 
## final  value 4027.954580 
## converged

summary(sosgrp.isni)

## 
## Call:
## isniglm(formula = ymodel, family = binomial, data = sosgrp, weights = total)
## 
##                          MAR Est.  Std. Err      ISNI      c
## (Intercept)              1.081531  0.055611  0.410141 0.1356
## genderfemale             0.030808  0.079583 -0.038983 2.0415
## facultymdv              -0.733886  0.149215 -0.169859 0.8785
## genderfemale:facultymdv  0.102133  0.206696  0.027542 7.5048

A tutorial containing more technical background and examples for longitudinal data

A tutorial describing the ISNI methodology and containing examples for ISNI computation for nonignorable missing data in longitudinal setting can be download (via)

isni

Hui Xie

2021-08-21

The `sos` example

Two-equation model specification

ISNI Analysis for Grouped Binomial Outcome

A tutorial containing more technical background and examples for longitudinal data

isni

Hui Xie

2021-08-21

The sos example

Two-equation model specification

ISNI Analysis for Grouped Binomial Outcome

A tutorial containing more technical background and examples for longitudinal data

The `sos` example