This document will present a elaborate understanding of the
`fPASS`

R package and its capabilities. First we need to load
the current development version of the package using the following code.
The package is designed to conduct for Power and Sample Size analysis
(PASS) for Projection-based Two-Sample test for functional data. Please
see Wang (2021) and Koner and Luo (2023) for the
details of the testing procedure and its advantage over the current
state-of-the-art testing procedures used for longitudinal data such as
mixed model and GEE.

The key function to use for finding the power and sample size (PASS)
for the test is `fPASS::PASS_Proj_Test_ufDA()`

. In the
following, we will see how the function can be used to conduct PASS for
the Projection-based test for longitudinal and functional data under
different covariance structure in order to design a randomized clinical
trials (RCT). We will present the examples of using the function with
different case studies.

Suppose that we want to conduct a clinical trial to test the efficacy
of a treatment effect against placebo, where the visit times for each
subjects are at baseline, 3 months, 6 months, and every 6 months
thereafter for 24 months. Assume that for each visit (except for the
baseline), there is an window of 15 days around the stipulated visit
time when the patients can appear to the clinic. This means that, a
patient can come to the clinic for third visit around \(5.5 - 6.5\) months from baseline. Assuming
that the higher the response, the better is the treatment, it is
hypothesized that the effect size (mean to variance ratio) between the
treatment and placebo will be \(1\)
unit each year, and the treatment effect follows a linear trend.
Further, it is assumed that the longitudinal response trajectory for
each subject has a marginal standard deviation about 5 units at each
visit time and assumes a dying correlation which can be characterized a
conditional auto regressive process with correlation parameter \(0.5\). Suppose that after collection of
data, we want to use the Two-sample projection-based
test by Wang (2021) to assess the treatment efficacy. Assuming that
we can have equal number of subjects in each of the treatment and the
placebo group, we want to find the minimum sample size required to test
the treatment efficacy with a power of \(80\%\) at a significance level of \(5\%\), using the routine
`fPASS::PASS_Proj_Test_ufDA()`

.

`fPASS::PASS_Proj_Test_ufDA()`

The argument of the function can be obtained by running
`?fPASS::PASS_Proj_Test_ufDA()`

in R. The key arguments of
are `mean_diff_fnm`

denoting the group difference function,
`obs.design`

specifying the observation design and
`cov.par`

specifying the covariance structure of the latent
trajectory. We will demonstrate how to supply each of the parameters in
the function from the statement of the problem.

`obs.design`

and`nobs_per_subj`

: The number of observation points for each subject are pre-determined, and they are not randomly varying among each other. This constitutes that it falls under`longitudinal`

design, and the visit times are \(0, 3, 6, 12, 18\) and \(24\) months with a window of \(0.5\) months for each visit other than baseline. The`obs.design`

argument should be set as

```
obs.design <- list("design" = "longitudinal",
"visit.schedule" = c(3,6,12,18,24), # pre-determined visit times, other than baseline
"visit.window" = 0.5) # visit window
nobs_per_subj <- length(obs.design$visit.schedule) + 1 # number of observation per subject is 6.
```

`cov.par`

and`cov.type`

: The conditional auto regressive process with correlation parameter \(0.5\) can be characterized by the`nlme::corCAR1()`

structure available in`nlme::corClasses`

. Therefore, we can characterize the correlation as

```
# The function will internally create a data set
# and a factor variable Subject denoting subject id.
# You can change time and Subject by any other name as well.
cor.str <- nlme::corCAR1(0.5, form = ~ time | Subject);
# The marginal sd at each time-point is assumed to be 5.
cov.par <- list("var" = 25, "cor" = cor.str)
# We must set cov.type = "ST" for any covariance
# structure belonging to nlme::corClasses.
cov.type <- "ST"
```

`mean_diff_fnm`

: Assuming that the group difference function has a linear trend, and that the group difference is 1 unit every year, which means the group difference is 4 unit after two years (24 months). As mentioned in the details in the argument section, internally, the function scales the specified visit times into \([0,1]\), so that baseline is 0 and 24 months is 1, therefore, the mean function can be written as

```
mean_diff_fn <- function(t){2*t}
mean_diff_fnm <- "mean_diff_fn" # the name of mean difference function needs to
# be specified with the argument.
```

`missing_type`

and`missing_percent`

: We assume no missing observation at each visit, so we will set the missing value related parameters as follows:

`alloc.ratio`

: Assuming that we have an equal allocation ratio of the samples in each group,

`target.power`

and`sig.level`

: We want to find the sample size required to achieve a power of \(80\%\) of test at significance level \(5\%\),

`eval_SS`

: The number of independent and identically distributed (i.i.d) replication of subjects based on which the eigencomponents of the covariance process will be estimated empirically. This should be set as a large number in order to correctly estimate the true number of eigenfunctions and to get enough precision on the estimated eigencomponents. Setting a small value of`eval_SS`

might lead to incorrect detection of large number of eigenfunctions, and then the power of the projection-based test will incorrectly inflated because the dimension and the magnitude of projection-vector gets arbitrarily large, as a result of projecting the mean difference function on additionally detected eigenfunctions (possibly with very small eigenvalues). Users must remember that the**computation time of the function is directly proportional to the size of**, which is described next item.`eval_SS`

, so they must expect that the run time will be higher if`eval_SS`

is larger, especially when`fpca.method`

is set as ‘face’

`fpca_method`

: The module for functional principal component analysis that will be used to estimate the eigencomponents, can be any one of ‘fpca.sc’, in which case`fPASS::fpca.sc()`

to be used or ‘face’ for`face::face.sparse`

function to be used as the primary eigenfunction estimation routine. Note that,`fPASS::fpca.sc()`

is just a copy of`refund::fpca.sc()`

routine, except here the`shrinkage`

scores (Yao et. al. 2005, JASA) are correctly estimated, and the issue of`NA`

value for the scores when the measurement error variance are estimated to be zero, are corrected.

`sigma2.e`

: Set a small but non-zero measurement error to ensure nugget effect and stability in inversion of covariance

`fpca_optns`

: The argument`fpca_optns`

must be a named list with elements that could specified as arguments of either of`fPASS::fpca.sc()`

function or`face::face.sparse()`

depending on the choice of the argument`fpca_optns`

. For example, the default value of`pve`

argument is \(0.99\), but the user can overwrite by setting it as follows. See the details of argument in the help page for what are compatible names for the elements in the list. A higher value of percentage of variation explained might lead to detection of extra eigenfunctions with close to zero eigenvalues, which might lead to inflated power of the test for fixed sample size.

`mean_diff_add_args`

: The user can specify any additional parameter in form of a named list that might need to be specified in the mean different function specified in the argument`mean_diff_fnm`

. In this case, there is nothing, so it will be specified as an empty list as follows:

`nWgrid`

: The length of grid points used to estimate the eigenfunction and to approximately compute the projection \(\int \mu_1(t) - \mu_2(t)\phi_k(t) dt\). We keep the default value of`nWgrid`

, \(201\).

`nsim`

: The number of samples to be generated from the alternate distribution of the Hotelling \(T^2\) statistic to accurate compute the power. The default value is set at 10000, but the user can change it in the function. In our example, we have used`nsim`

\(=1000\) for faster computation of the vignette.

`fPASS::fpca.sc()`

function:Now that we have explained how to specify each and every arguments of
`fPASS::PASS_Proj_Test_ufDA()`

function, we will compute the
minimum sample size required. However, as the testing procedure
internally computes the eigenfunctions based on large number of samples
and then uses the estimated eigenfunctions to compute the PASS, the
computed power and the minimum sample size will be different due to the
fact that the estimated eigenfunctions are different at each run because
of sampling variation. Therefore, it is important to run the function
`fPASS::PASS_Proj_Test_ufDA()`

for few iterations to see how
the sampling variation affects the computed power or sample size. In
this example, we run the same function in parallel for about 10 times,
and present the percentiles of the estimated sample size. It is
recommended to run the the function at least about 50 to 100 times and
use the median of these computed sample sizes obtained after a
reasonable number of iteration, as a point estimate of the minimum
sample size and the interquartile range (IQR) as a valid measure of
range of the sample size that could be considered. From a practical
perspective, depending on the budget of the trial, one can choose the
minimum number of subjects that will be enrolled for the trial based on
the value of IQR.

```
library(foreach)
required_sample_size <- foreach(i=1:10, .combine='c', .packages = "fPASS") %do% {
mean_diff_fn <- mean_diff_fn
PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
sig.level = sig.level, nobs_per_subj = nobs_per_subj,
obs.design = obs.design, mean_diff_fnm = "mean_diff_fn",
cov.type = cov.type, cov.par = cov.par,
sigma2.e = sigma2.e, missing_type = missing_type,
missing_percent = missing_percent, eval_SS = eval_SS,
alloc.ratio = alloc.ratio, fpca_method = fpca_method,
mean_diff_add_args = mean_diff_add_args,
fpca_optns = fpca_optns, npc_to_use = NULL,
nsim = 1e3)$required_SS
}
```

```
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)
#> 25% 50% 75%
#> 155.4684 162.3803 169.8949
```

We have intentionally left out the discussion of
`npc_to_use`

argument for the
`fPASS::PASS_Proj_Test_ufDA()`

function. The number of
eigenfunction to use to compute the PASS is not before hand. Therefore,
for a specified sampling design and covariance structure of the response
trajectory, it is not possible to to provide an appropriate value unless
the user has an substantial information about the eigencomponents
before. This is where the function
`fPASS::Extract_Eigencomp_fDA()`

can assist. If we look at
the help page of that function, you can notice that except for one
argument, all of arguments of that function matches that of
`fPASS::PASS_Proj_Test_ufDA()`

. Therefore, for a specified
sampling design and covariance parameter we can take a look the
estimated eigenfunctions and eigenvalues to have a more informed
decision how many eigencomponents to use to in the main function to
conduct the PASS analysis. Here is an example of concering to in our
case study.

`fPASS::fpca.sc()`

function```
fpca_method <- "fpca.sc"
library(foreach)
eigenlist <- foreach(i=1:3, .packages = "fPASS") %do% {
mean_diff_fn <- mean_diff_fn
eigencomp <- Extract_Eigencomp_fDA(nobs_per_subj = nobs_per_subj,
obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
cov.type = cov.type, cov.par = cov.par,
sigma2.e = sigma2.e, missing_type = missing_type,
missing_percent = missing_percent, eval_SS = eval_SS,
alloc.ratio = alloc.ratio, fpca_method = fpca_method,
mean_diff_add_args = mean_diff_add_args,
fpca_optns = fpca_optns,
data.driven.scores = FALSE)
eigencomp[c("working.grid", "est_eigenfun", "est_eigenval")]
}
```

```
print(lapply(eigenlist, function(x) x$est_eigenval))
#> [[1]]
#> [1] 19.876154 2.446943
#>
#> [[2]]
#> [1] 19.662736 2.346524
#>
#> [[3]]
#> [1] 19.545208 2.308516
```

For a specified PVE of \(95\%\),
`fPASS::fpca.sc()`

function detects two leading eigenfunction
with non-zero eigenvalues for the covariance structure in most of the
cases. Next, we see what happens by using
`face::face.sparse()`

function for the same PVE The
`face::face.sparse()`

function is much slower than the
`fPASS::fpca.sc()`

, so we have kept `eval_SS`

argument at \(500\) to not
significantly increase the run time of the vignette. The user are
advised to use a large value of `eval_SS`

in order to get
high enough precision for the estimated eigenfunction.

`face::face.sparse()`

function```
fpca_method <- "face"
# Setting eval_SS to 500, specifically for
# the face::face.sparse() function
# to ensure that the vignette builds within
# real time. The user must set a much larger value
# of eval_SS e.g. ranging between 2000-5000, to ensure
# enough precision accuracy for the eigenfunctions.
eval_SS <- 500 # increase it to 2000 for practical case
library(foreach)
# library(doParallel)
# cl <- makeCluster(detectCores()-1)
# registerDoParallel(cl)
eigenlist <- foreach(i=1:3, .packages = "fPASS") %do% {
mean_diff_fn <- mean_diff_fn
eigencomp <- Extract_Eigencomp_fDA(nobs_per_subj = nobs_per_subj,
obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
cov.type = cov.type, cov.par = cov.par,
sigma2.e = sigma2.e, missing_type = missing_type,
missing_percent = missing_percent, eval_SS = eval_SS,
alloc.ratio = alloc.ratio, fpca_method = fpca_method,
mean_diff_add_args = mean_diff_add_args,
fpca_optns = fpca_optns,
data.driven.scores = FALSE)
eigencomp[c("working.grid", "est_eigenfun", "est_eigenval")]
}
# stopCluster(cl)
# matplot(eigencomp$working.grid, eigencomp$est_eigenfun, type="l",
# xlab = "timepoints (scaled)", ylab = "eigenfunctions")
```

```
print(lapply(eigenlist, function(x) x$est_eigenval))
#> [[1]]
#> [1] 21.4722854 2.4345531 0.7338938
#>
#> [[2]]
#> [1] 18.7551799 2.2752391 0.5402706
#>
#> [[3]]
#> [1] 19.8852031 2.1993433 0.4408327
```

We notice that `face::face.sparse()`

function is detecting
three eigenfunction in most of the cases, for the same setup. However,
the eigenvalue corresponding to the last eigenfunction seems to be
small, and that is possibly the reason it was not detected by
`fPASS::fpca.sc()`

function. This provides us a more informed
decision about the number of eigenfunctions to use. So, technically we
should discard the last eigenfunction to compute the projection of mean
difference onto the eigenfunctions, and use `npc_to_use = 2`

in the `fPASS::PASS_Proj_Test_ufDA`

function, so that the
power function is not overinflated by detection of extra eigenfunctions.
However, for better understanding of the reader, we will still use
`npc_to_use = 3`

for `fpca_method == 'face'`

to
find out how the distribution of minimum sample size required is
allowing an extra eigenfunction (with possibly zero eigenvalues) for
`fpca_method == 'face'`

. The number of replication is kept at
10, to ensure that the vignette runs within a reasonable amount of time,
please increase it for more comprehensive understanding of the effect of
estimating eigenfunctions on range of the minimum sample size
required.

`face::face.sparse()`

function with pre-specified number of
eigenfunction.```
fpca_method <- "face"
library(foreach)
required_sample_size <- foreach(i=1:5, .combine='c', .packages = "fPASS") %do% {
mean_diff_fn <- mean_diff_fn
PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
sig.level = sig.level, nobs_per_subj = nobs_per_subj,
obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
cov.type = cov.type, cov.par = cov.par,
sigma2.e = sigma2.e, missing_type = missing_type,
missing_percent = missing_percent, eval_SS = 500,
alloc.ratio = alloc.ratio, fpca_method = fpca_method,
mean_diff_add_args = mean_diff_add_args,
fpca_optns = fpca_optns, npc_to_use = npc_to_use,
nsim = 1e3)$required_SS
}
```

```
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)
#> 25% 50% 75%
#> 64.72774 82.47207 240.18890
```

As expected, using a higher value of `npc_to_use`

leads to
over inflation of power, and consequently computed sample size is
significantly less than which we obtained for
`fpca_method = 'fpca.sc'`

. But, this comes at cost of over
inflation of type I error of the test. Therefore, for a specified
correlation structure of the response, we must take extra care to NOT
use unnecessary eigenfunctions, to avoid the risk of inflation of
inflated type I error.

Under the same setup, if We want to compute the minimum sample size required when the percentage of missing observation at each time point is around \(25\%\), then we can see the distribution of sample size required after running the function about 100 times as follows.

```
fpca_method <- "fpca.sc"
eval_SS <- 5000
library(foreach)
required_sample_size <- foreach(i=1:10, .combine='c', .packages = "fPASS") %do% {
mean_diff_fn <- mean_diff_fn
PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
sig.level = sig.level, nobs_per_subj = nobs_per_subj,
obs.design = obs.design, mean_diff_fnm = "mean_diff_fn",
cov.type = cov.type, cov.par = cov.par,
sigma2.e = sigma2.e, missing_type = missing_type,
missing_percent = missing_percent, eval_SS = eval_SS,
alloc.ratio = alloc.ratio, fpca_method = fpca_method,
mean_diff_add_args = mean_diff_add_args,
fpca_optns = fpca_optns, npc_to_use = 2, nsim = 1e3)$required_SS
}
```

```
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)
#> 25% 50% 75%
#> 139.3561 147.7104 178.8790
```

The IQR of the distribution of the minimum sample sizes when the
percentage of missing is as significant as \(25\%\) is still about the same as that when
there is no missing, i.e. `missing_type == 'nomiss'`

and
`missing_percent = 0`

. This reflects one of the biggest
advantage of the projection-based test compared to traditional mixed
model approaches, that as long as the missing percentage do not affect
the estimation of the eigencomponents, the power of the test does not
degrade with the increasing percentage of missing responses.

Do try it out the same example in NCSS PASS software and check out
the minimum sample size required for the test to achieve a power of
\(80\%\), and how does the minimum
sample size increases when you increase the missing percentage to \(25\%\), to get a fair comparison of the
effectiveness of our procedure compared to traditional GEE type tests.
To do this, go to NCSS PASS (2023) software, *GEE -> GEE Tests for
the Slope of Two groups in a Repeated Measures Design (Continuous
Outcome) -> Choose AR(1) proportional correlation structure and
Number of measurement times as 6.*

We report the total computation time it needed create the entire vignette is approximately 3 minutes.