The sequential probability ratio test (SPRT) was proposed by Wald (1947), Wald and
Wolfowitz (1948). as a way to do continuous sampling to establish
or raise concerns about product quality. There is a wide literature on
this topic which we do not attempt to summarize here. In clinical
trials, the SPRT for a single arm binary endpoint can be useful to raise
or alleviate concerns about a shortterm endpoint such as occurrence of
an important safety endpoint or, for efficacy, a response rate. The
function binomialSPRT()
implements a single arm version of
the SPRT for a binary outcome. While comparative SPRT tests are also
available for comparing multiple arms, we do not cover those here.
You may think that having an sequential design for a trial obligates you to do an evaluation after every observation. An alternative view is that you can analyze whenever you want and not worry about whether Type I error is controlled.
Consider a single arm where there is historical data suggesting the a positive response to treatment occurs in no more than 10% of patients with currently available treatments. Assume that there is interest in having the trial be wellpowered to detect a response rate of 35% in a new treatment. The SPRT is defined as a continuous testing procedure without a maximum sample size. Practically speaking, this is implemented with a minimum and maximum sample size. For our example we assume a minimum sample size of 10 and a maximum sample size of 25. We will initially set a onesided Type I error of \(\alpha=0.08\) and power of 80% (\(1\beta = 0.2\)):
library(gsDesign)
b < binomialSPRT(p0 = .1, p1 = .35, alpha = .08, beta = .2, minn = 10, maxn = 25)
plot(b)
The above plot tests first after 10 patients. If 4/10 have responded, you can reject the null hypothesis of a 10% response rate. If 0 or 1 of 10 have responded, you can conclude that the targeted 35% response rate is not realistic. Note that the number of responses required to cross a bound is a step function due to the discrete nature of the problem. We see at the maximum sample size of 25:
Functions are available to summarize design properties. For example, we can make a power plot:
library(ggplot2)
p < plot(b, plottype = 2)
p + scale_y_continuous(breaks = seq(0, 90, 10))
Probability of three possible outcomes are summarized by the underlying response rate:
We now provide a summary table for operating characteristics. The user can ignore reviewing the code, but may copy if wishing to produce a similar table.
library(dplyr)
library(tidyr)
# Compute boundary crossing probabilities for selected response rates
b_power < gsBinomialExact(
k = length(b$n.I), theta = seq(.1, .45, .05), n.I = b$n.I,
a = b$lower$bound, b = b$upper$bound
)
b_power %>%
as_table() %>%
as_gt()
Operating Characteristics for the Truncated SPRT Design  
Assumes trial evaluated sequentially after each response  
Underlying response rate 
Probability of crossing  Average sample size 


Futility bound  Efficacy bound  
10%  0.94  0.04  12.1 
15%  0.78  0.15  13.6 
20%  0.57  0.32  14.3 
25%  0.37  0.53  14.2 
30%  0.22  0.71  13.4 
35%  0.12  0.84  12.5 
40%  0.06  0.92  11.6 
45%  0.03  0.97  11.0 
Next we consider a safety monitoring example. Suppose a new treatment has a mechanism of action that has potential for an elevated rate of a specific adverse experience (AE); e.g., serious rash. Suppose that this already occurs with some low frequency in the population proposed for a study at a rate of about 4% and that a 10% rate would be considered unacceptable. While a comparison of the two arms could be considered with an SPRT, we demonstrate here a monitoring bound for the experimental arm only. We assume the proposed sample size for the study is 75 per arm and that we will not stop the trial for serious rash before 4 patients have been studied in the experimental group.
safety_design < binomialSPRT(p0 = .04, p1 = .1, alpha = .04, beta = .2, minn = 4, maxn = 75)
plot(safety_design)
We see above that if we have no serious rashes in the first 25 experimental group patients or 1 in the first 40 that we reject the 10% rate of concern. On the other hand, if the first 4 of the first 4 to 14 patients have serious rashes or 5 of the first 15 to 29 patients have serious rashes we can reject the hypothesis that there is no elevation over the presumed 4% population rate.
The design operating characteristics are now summarized both in a plot and a table.
plot(safety_design, plottype = 2)