For an open access tutorial paper explaining how to set equivalence
bounds, and how to perform and report equivalence for ANOVA models see
Campbell and Lakens (2021). These
functions are meant to be omnibus tests, and additional testing may be
necessary^{1}.

Statistical equivalence testing (or “omnibus non-inferiority testing”
as stated by Campbell and Lakens (2021))
for *F*-tests are special use case of the cumulative distribution
function of the non-central *F* distribution.

As Campbell and Lakens (2021) state, these type of questions answer the question: “Can we reject the hypothesis that the total proportion of variance in outcome Y attributable to X is greater than or equal to the equivalence bound \(\Delta\)?”

\[ H_0: \space 1 > \eta^2_p \geq \Delta \\ H_1: \space 0 \geq \eta^2_p < \Delta \]

In `TOSTER`

we go a tad farther and calculate a more
generalization of the non-centrality parameter to allow for the
equivalence test for *F*-tests to be applied to variety of
designs.

Campbell and Lakens (2021) calculate
the *p*-value as:

\[ p = p_f(F; J-1, N-J, \frac{N \cdot \Delta}{1-\Delta}) \]

However, this approach could not be applied to factorial ANOVA and the paper only outlines how to apply this approach to a one-way ANOVA and an extension to Welch’s one-way ANOVA.

However, the non-centrality parameter (ncp = \(\lambda\)) can be calculated with the equivalence bound and the degrees of freedom:

\[ \lambda_{eq} = \frac{\Delta}{1-\Delta} \cdot(df_1 + df_2 +1) \]

The *p*-value for the equivalence test (\(p_{eq}\)) could then be calculated from
traditional ANOVA results and the distribution function:

\[ p_{eq} = p_f(F; df_1, df_2, \lambda_{eq}) \]

Using the `InsectSprays`

data set in R and the base R
`aov`

function we can demonstrate how this omnibus
equivalence testing can be applied with `TOSTER`

.

From the initial analysis we an see a clear “significant” effect (the
p-value listed is zero but it just very small) of the factor spray.
However, we *may* be interested in testing if the effect is
practically equivalent. I will arbitrarily set the equivalence bound to
a partial eta-squared of 0.35 (\(H_0: \eta^2_p
> 0.35\)).

```
library(TOSTER)
# Get Data
data("InsectSprays")
# Build ANOVA
aovtest = aov(count ~ spray,
data = InsectSprays)
# Display overall results
knitr::kable(broom::tidy(aovtest),
caption = "Traditional ANOVA Test")
```

term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|

spray | 5 | 2668.833 | 533.76667 | 34.70228 | 0 |

Residuals | 66 | 1015.167 | 15.38131 | NA | NA |

We can then use the information in the table above to perform an
equivalence test using the `equ_ftest`

function. This
function returns an object of the S3 class `htest`

and the
output will look very familiar to the the t-test. The main difference is
the estimates, and confidence interval, are for partial \(\eta^2_p\).

```
##
## Equivalence Test from F-test
##
## data: Summary Statistics
## F = 34.702, df1 = 5, df2 = 66, p-value = 1
## 95 percent confidence interval:
## 0.5806263 0.7804439
## sample estimates:
## [1] 0.724439
```

Based on the results above we would conclude there is a significant
effect of “spray” and the differences due to spray are *not*
statistically equivalent. In essence, we reject the traditional null
hypothesis of “no effect” but accept the null hypothesis of the
equivalence test.

The `equ_ftest`

is very useful because all you need is
very basic summary statistics. However, if you are doing all your
analyses in R then you can use the `equ_anova`

function. This
function accepts objects produced from `stats::aov`

,
`car::Anova`

and `afex::aov_car`

(or any anova
from `afex`

).

```
## effect df1 df2 F.value p.null pes eqbound p.equ
## 1 spray 5 66 34.70228 3.182584e-17 0.724439 0.35 0.9999965
```

Just like the standardized mean differences, `TOSTER`

also
has a function to visualize \(\eta^2_p\).

The function, `plot_pes`

, operates in a fashion very
similar to `equ_ftest`

. In essence, all you have to do is
provide the F-statistic, numerator degrees of freedom, and denominator
degrees of freedom. We can also select the type of plot with the
`type`

argument. Users have the option of producing a
consonance plot (`type = "c"`

), a consonance density plot
(`type = "cd"`

), or both (`type = c("cd","c")`

).
By default, `plot_pes`

will produce both plots.

Power for an equivalence *F*-test can be calculated with the
same equations supplied by Campbell and Lakens
(2021). I have included these within the `power_eq_f`

function.

`## Note: equ_anova only validated for one-way ANOVA; use with caution`

```
##
## Power for Non-Inferiority F-test
##
## df1 = 2
## df2 = 60
## eqbound = 0.15
## sig.level = 0.05
## power = 0.8188512
```

Campbell, Harlan, and Daniël Lakens. 2021. “Can We Disregard the
Whole Model? Omnibus Non-Inferiority Testing for R2 in Multi-Variable
Linear Regression and in ANOVA.” *British Journal of
Mathematical and Statistical Psychology* 74 (1): e12201. https://doi.org/10.1111/bmsp.12201.

Russ Lenth’s emmeans R package has some capacity for equivalence testing on the marginal means (i.e., a form of pairwise testing). See the emmeans package vignettes for details↩︎