neatStats example use: pipeline from raw data to reportable statistics

Material

The data used for the example is available at https://osf.io/49sq5/, under the example_data folder. The data was created artificially to simulate results from a hypothetical experiment. Each file represents the results of a participant who completed a task with positive and negative words (e.g. “happy”, “peaceful” or “terror”, “evil”) displayed in green or red, classifying each word with key presses according to valence (positive vs. negative). The hypotheses are that (a) negative words have slower responses (lower response times; RTs) in general (Valence main effect), and (b) positive words displayed in red and negative words displayed in green have slower responses (Valence x Color interaction). Error rates (ERs) should follow similar pattern (i.e. more incorrect responses are expected in cases where slower responses are expected). However, for about half of the participants, the green and red words were presented in separate blocks: all words shown first in green, and then again all words in red, or vice versa (all in red, then all in green). For these participants, color effects should be absent (due to no polarity): no Valence x Color interaction expected in this group.

Each participant’s data is given as a simple text file with .txt extension, whose file name contains the experiment title “expsim_color_valence”, the given condition (“separate” or “mixed”), and the subject number (1-180), hence, for example, “expsim_color_valence_mixed_1.txt”.

In this simplified case, each file contains the following data columns:

subject_num: subject number;
condition: “mixed” for the task with red and green colored words randomly mixed, and “separate” for the task with the two colors separated into two blocks;
rt: RTs, in ms, of the response to each stimulus;
response: “correct” for correct key pressed, “incorrect” for wrong key pressed, or “tooslow” for waiting too long with the response;
color: color of each given presented stimulus (“green” or “red”);
valence: color of each given presented stimulus (“positive” or “negative”);
age: age of the participant;
gender: gender of the participant (1, for male, or 2, for female).

(The values for subject_num, condition, age, and gender are of course constant for each participant, i.e., have same value in every row.)

(Fig. 1) Raw data

The script that is below presented step by step and described in detail, is also available with only brief inline comments at https://osf.io/49sq5/ (as “example_analysis.R”).

Processing the data

First load the library. (If not yet installed, just run install.packages("neatStats"))

library('neatStats')

Then, set, as current working directory, the path to the folder that contains the data files. If you use RStudio, the path_neat() function returns the path of the script file from which this function is executed. This function takes a single argument, which will simply be appended to the path. For example, if the data files are placed into a folder, named “example_data”, next to the analysis script, then the path_neat('example_data') will return the full path to that folder. Therefore, you can set the current working directory as follows.

setwd(path_neat('example_data'))

If you are using anything other than RStudio, you need to give the full path manually, for example, as setwd("C:/research/proj_neatstats/example_data"). (To note, path_neat() is the only function in the neatStats package, as well as this Example document, that requires RStudio.)

To collect all file names in the directory, you can use the list.files() function. Since all result file names start with “expsim_color_valence_”, and end with “.txt”, you can specify this pattern to make sure that no other files that may be in this folder are collected.

filenames = list.files(pattern = "^expsim_color_valence_.*txt$")

Now that you have the list of all the file names (in the filenames variable), you can loop through it, and, for each file name, read in the data from the corresponding file and extract the data that you need. The data from the participants will be merged together in one data frame, named subjects_merged, which will contain in each of its row the extracted data of a single participant: namely, condition, age, gender, as well as the mean (aggregated) RTs and error rates for each stimulus type; as shown below.

(Fig. 2) Aggregated data

Detailed explanation follows the code below.

for (file_name in enum(filenames)) {
    cat(file_name, fill = TRUE)
    subject_data = data.table::fread(
        file_name[2],
        fill = TRUE
    )
    if (nrow(subject_data) != 100) {
        stop("unexpected trial number: ", nrow(subject_data))
    }
    rts = aggr_neat(
        subject_data,
        rt,
        group_by = c('color', 'valence'),
        method = mean,
        prefix = 'rt',
        filt = (rt > 150 & response == 'correct')
    )
    ers = aggr_neat(
        subject_data,
        response,
        group_by = c('color', 'valence'),
        method = 'incorrect',
        prefix = 'er',
        filt = (response %in% c('correct', 'incorrect'))
    )
    er_overall = aggr_neat(subject_data,
                           response,
                           method = 'incorrect',
                           filt = (response %in% c('correct', 'incorrect')))$aggr_value
    rbind_loop(
        subjects_merged,
        subject_id = subject_data$subject_num[1],
        condition = subject_data$condition[1],
        gender = subject_data$gender[1],
        age = subject_data$age[1],
        er_overall = er_overall,
        rts,
        ers
    )
}

The enum() function prepends numbering to the file names (1 for first, 2 for second, etc.; this can be disabled via the enumerate parameter) merely for display, and, more importantly, it indicates a newly initiated loop for the rbind_loop function (see later).

A preliminary note: to test the code within the loop in detail, we can assign a single file name to the file_name variable (e.g., as file_name = c(0, 'expsim_color_valence_mixed_1.txt')), and then proceed to execute the following lines one by one and check the corresponding results.

The cat(file_name, fill = TRUE) line just prints the present file name to the console, to let you know which file is currently being processed. This is especially useful when the script is stopped due to an error: in that case you know which file caused the error.

cat(file_name, fill = TRUE)

Then the data.table::fread() function reads in the data of the current file. (This is similar to but more convenient and probably faster than read.table.) I always set fill = TRUE as a precaution, although it should not matter in the present data; see ?data.table::fread for details.

subject_data = data.table::fread(
    file_name[2],
    fill = TRUE
)

Next, as a quick check to ensure the basic integrity of the data, I always verify that the data contains the expected amount of rows (i.e., the number of trials in the experiment). Otherwise the script is stopped an the “unexpected trial number” (i.e., number of rows) is printed.

if (nrow(subject_data) != 100) {
    stop("unexpected trial number: ", nrow(subject_data))
}

We can then get the mean RTs and ERs, per each stimulus type, using aggr_neat(). Here there are four stimulus types: (1) positive words in green, (2) positive words in red, (3) negative words in green, and (4) negative words in red. These four combinations can be obtained by grouping by the color and valence columns. For RTs, we need to get the mean of the values from the rt column (for each stimulus type). The method is mean by default, I write it out explicitly only for clarity. Without giving prefix, the default output names for item types would be like, for example, green_negative, red_positive, etc. (automatically derived from the group_by arguments). To clarify that this is our RT measure, we can add ‘rt’ as prefix, so that the item type names will be as, for example, rt_green_negative. Finally, since we are only interested in the RTs of correct responses, we filter for response == 'correct', and, since RTs below 150 ms are probably just accidental (as such fast reactions are extremely unlikely), we also filter for rt > 150.

rts = aggr_neat(
    subject_data,
    rt,
    group_by = c('color', 'valence'),
    method = mean,
    prefix = 'rt',
    filt = (rt > 150 & response == 'correct')
)

The procedure is similar for ERs, except that the aggr_neat() has a special method for getting ratios of specific values (which are otherwise not straightforward with a single function): whenever the argument for the method parameter is a string (i.e., text in quotation marks; character mode), the ratio of occurrences of the given string (in this case, the text "incorrect") in the specified column (here: response) is returned (for each stimulus type). Since we usually do not want to include too slow responses in the calculation of error rates, we can filter for response %in% c('correct', 'incorrect'), and thereby get the ratio of the number of incorrect responses as compared to the number of correct and incorrect responses. (Of course, in this case the same could be achieved by a filter response != 'tooslow', but perhaps the other one is clearer.) As always, you can check ?aggr_neat for more details.

ers = aggr_neat(
    subject_data,
    response,
    group_by = c('color', 'valence'),
    method = 'incorrect',
    prefix = 'er',
    filt = (response %in% c('correct', 'incorrect'))
)

The aggr_neat() function returns the value names (e.g., rt_green_negative) and values as columns (with column names aggr_group and aggr_value). The RT and ER values will eventually be transformed and merged into a single line together with the other subject information.

We would also like to get the overall ER (regardless of stimulus type), because we want to exclude participants with generally very high ER. (They may not have been paying attention or had undisclosed vision problems, etc.). For this too, we can use the aggr_neat() function, only omitting the group_by argument, and appending, at the end, $aggr_value, in order to access the single value returned under this column. To note, the same value can also be quite easily obtained without aggr_neat(), by writing, for example, nrow(subject_data[subject_data$response == 'incorrect',]) / nrow(subject_data[subject_data$response %in% c('correct', 'incorrect'),]). But again, using aggr_neat() might be clearer.

er_overall = aggr_neat(subject_data,
                       response,
                       method = 'incorrect',
                       filt = (response %in% c('correct', 'incorrect')))$aggr_value

Finally, we also want the subject number, condition, age, and gender. These latter variables are constant in their respective columns, so we might as well take them from any row, for example the first row (e.g., for subject_num: subject_data$subject_num[1]).

We can now use the rbind_loop function that initiates a given data frame (here: subjects_merged) at the first cycle of the loop (internally detected via enum()) and adds a new row in each cycle by “intelligently” merging all provided data and transforming them into a single row (see ?rbind_loop for details).

rbind_loop(
    subjects_merged,
    subject_id = subject_data$subject_num[1],
    condition = subject_data$condition[1],
    gender = subject_data$gender[1],
    age = subject_data$age[1],
    er_overall = er_overall,
    rts,
    ers
)

When running the full for loop, the above described steps are repeated for each data file. After the loop has been run, the subjects_merged data frame is ready for statistical analysis. It might be worth noting that, while there is a “subject_id” column in this subjects_merged data frame, this is merely to keep track of records, but none of the statistical functions below require it: each participant is represented by a single row in the data frame, hence no additional identifier is needed.

Statistics

At this point you might want to list column names, using str(subjects_merged), for a quick overview of the content as well as for the convenient copy-pasting for subsequent use in statistical functions.

Before any tests, we exclude subjects with overall error rate larger than 20%.

data_final = excl_neat(subjects_merged, er_overall < 0.20, group_by = 'condition')

This automatically calculates and prints the number of exclusions and number of remaining participants per condition, showing three exclusions in the ‘mixed’ condition, and two in the ‘separate’ condition, leaving 87 and 89, respectively.

Moving on to the first (descriptive) statistics, the dems_neat() gives the average age, and number of males (or percentage, if so set), using (automatically) the age and gender columns. (There is no missing age or gender data in the example data; but otherwise the missing numbers would be displayed as well.)

dems_neat(data_final, group_by = 'condition')

The console output is:

Group < mixed >: 87 subjects (age = 24.2±3.8, 49 male)
Group < separate >: 89 subjects (age = 24.8±3.3, 45 male)

The main test is an ANOVA for the interaction Valence (positive vs. negative) × Color (green vs. red) × Group (separate vs. mixed). Since each participant may have several variables of interest (in case of a within-subject design such as in this example), all variables to be included in the test are given using their column names (as strings) in a string vector (or, in case of no within-subject factors, as a single string element), as the values parameter. To determine which within-subject factors we want to contrast in the ANOVA (using the given values), there is a within_ids parameter that accepts a list as argument. In this list, the name of each element is the chosen display name for each factor; in this case “color” and “valence” (but we could use any other names as well). Each element must contain a vector of names that are used to identify which of the value names (given as values) belong to which factor. For example, the Color factor is given as color = c('green', 'red'). Using the given strings 'green' and 'red', the given variable (or value) names 'rt_green_negative' and 'rt_green_positive' will be automatically identified as 'green' (since they contain the string 'green'), while the values 'rt_red_negative' and 'rt_red_positive' will be identified as 'red' (since they contain the string 'red'). The between subject variables can simply given assigned to the between_vars parameter as a string vector, or, in case of only one between-subject factor (as in this example), as a single string element.

In addition, here I specify adding factorial plot (plot_means), normality tests (norm_tests), and variance descriptive statistics and tests (var_tests).

anova_neat(
    data_final,
    values = c(
        'rt_green_negative',
        'rt_green_positive',
        'rt_red_negative',
        'rt_red_positive'
    ),
    within_ids = list(
        color = c('green', 'red'),
        valence = c('positive', 'negative')
    ),
    between_vars = 'condition',
    plot_means = TRUE,
    norm_tests = 'all',
    var_tests = TRUE
)

The following plot is returned.

(Fig. 3) Mean and CI plot

All seems as expected. The error bars show, by default, the 95% CIs of the means. Based on these CIs, the differences seem convincing. (Although, as a side note: in some cases such a plot can actually lead one to underestimate the certainty because it gives no information about the correlation of within-subject variables, which, e.g. in case of RTs, can be extremely high, r > 0.9, hence potentially giving substantial evidence despite very small mean differences.) Note that the plots is implemented via the plot_neat function. Several features are customizable; see ?plot_neat (to which arguments can also be passed via anova_neat). For example, to illustrate variation instead of certainty, we can display, with the error bars, the SDs of the means by adding the eb_method = sd. The main method could be replaced as well, for example, by setting method = median, to get medians instead of means, to control for outliers. (The corresponding error bars could be median absolute deviations; eb_method = mad.)

Here I do not go into details of all output; suffices to say that neither normality nor equal variances tests are violates; though the former is better checked via plotting the residuals (set norm_plots = TRUE) and the latter via checking whether the sample sizes and SDs between the groups are similar within each within-subject level combination (as it is returned when setting var_tests = TRUE).

The main ANOVA output is:

F(1,174) = 29507.56, p < .001, ηp2 = .994, 90% CI [.993, .995], ηG2 = .991. ((Intercept))
F(1,174) = 27.03, p < .001, ηp2 = .134, 90% CI [.065, .213], ηG2 = .094. (condition)
F(1,174) = 0.78, p = .379, ηp2 = .004, 90% CI [0, .035], ηG2 = .001. (color)
F(1,174) = 223.33, p < .001, ηp2 = .562, 90% CI [.483, .622], ηG2 = .112. (valence)
F(1,174) = 0.11, p = .736, ηp2 = .001, 90% CI [0, .019], ηG2 < .001. (color × condition)
F(1,174) = 0.02, p = .888, ηp2 < .001, 90% CI [0, .009], ηG2 < .001. (condition × valence)
F(1,174) = 56.89, p < .001, ηp2 = .246, 90% CI [.159, .330], ηG2 = .038. (color × valence)
F(1,174) = 34.92, p < .001, ηp2 = .167, 90% CI [.090, .248], ηG2 = .024. (color × condition × valence)

Without going into details, the three-way interaction is significant. (To note, the statistics are as close to as possible to reportable format, but italics, subscripts, and superscripts are not well supported as console outputs - hence these have to be adjusted when preparing a manuscript.)

The ANOVA could be repeated for error rates by simply replacing “rt_” with “er_” in the four variable names for the values parameter. Similarly, all the tests below would be the same for ERs (except for changing the variable input), but these are omitted here for brevity.

You follow up (as preregistered of course) with two separate ANOVAs to show the absence of Color x Valence interaction separate condition, and its presence in the mixed condition.

anova_neat(
    data_final[data_final$condition == 'separate', ],
    values = c(
        'rt_green_negative',
        'rt_green_positive',
        'rt_red_negative',
        'rt_red_positive'
    ),
    within_ids = list(
        color = c('green', 'red'),
        valence = c('positive', 'negative')
    ),
    bf_added = TRUE
)

Here I added BFs. While the rest of the numbers will always be identical for the same data, the BF can vary slightly (typically only in fractional digits) due to its inherent random sampling process. My specific out put is:

F(1,88) = 13114.19, p < .001, ηp2 = .993, 90% CI [.991, .995], ηG2 = .990. ((Intercept))
F(1,88) = 0.72, p = .398, ηp2 = .008, 90% CI [0, .064], ηG2 = .001, BF01 = 6.25. (color)
F(1,88) = 117.52, p < .001, ηp2 = .572, 90% CI [.456, .650], ηG2 = .107, BF10 = 2.15 × 10^16. (valence)
F(1,88) = 1.27, p = .262, ηp2 = .014, 90% CI [0, .079], ηG2 = .002, BF01 = 3.14. (color × valence)

As expected, no significant interaction. The BF for the interaction is also just large enough to be labeled as substantial evidence for equivalence. (If you are not convinced, you can use the data_generation_code.R script at https://osf.io/49sq5/ to “take more participants”, and rerun the test with the increased sample size.) To note, BFs supporting equivalence (i.e., are below 1) are always inverse, hence all BFs displayed are above 1, and support for equivalence is indicated by the numbers 01, such as in BF01 (as opposed to BF10, for BF supporting difference). When assigning the anova_neat() function (e.g., my_results = anova_neat(...)), it will return a list that contains, among other things, the exact values of the statistics for each effect, including unconverted BFs.

Now, for the mixed condition.

anova_neat(
    data_final[data_final$condition == 'mixed', ],
    values = c(
        'rt_green_negative',
        'rt_green_positive',
        'rt_red_negative',
        'rt_red_positive'
    ),
    within_ids = list(
        color = c('green', 'red'),
        valence = c('positive', 'negative')
    ),
    bf_added = TRUE
)

My output is:

F(1,86) = 16701.63, p < .001, ηp2 = .995, 90% CI [.993, .996], ηG2 = .992. ((Intercept))
F(1,86) = 0.15, p = .699, ηp2 = .002, 90% CI [0, .041], ηG2 < .001, BF01 = 8.11. (color)
F(1,86) = 106.02, p < .001, ηp2 = .552, 90% CI [.432, .635], ηG2 = .117, BF10 = 4.15 × 10^12. (valence)
F(1,86) = 106.33, p < .001, ηp2 = .553, 90% CI [.433, .635], ηG2 = .121, BF10 = 7.01 × 10^17. (color × valence)

Interaction significant as expected. Now to explore the interaction in the mixed condition, we could do various t-tests (four in “parallel” and even two “crosswise”), but perhaps what’s interesting is to check whether there is a significant difference between red and green in case of either positive or negative words.

First, for convenience, I create a new data frame with only mixed condition.

subjects_mx = excl_neat(data_final, condition == 'mixed')

Now test red versus green for positive words.

t_neat(subjects_mx$rt_green_positive,
       subjects_mx$rt_red_positive,
       pair = TRUE)

Correlation: r(85) = .613, 95% CI [.462, .729], p < .001.
Descriptives: M±SD = 538.72±51.31 vs. 574.93±56.86 (raw mean difference: 36.20, 95% CI [–46.40, –26.01])
t(86) = –7.06, p < .001, d = –0.76, 95% CI [–0.99, –0.52], BF10 = 2.45 × 10^7.

(Along with descriptives, in case of paired samples, t_neat() by default also prints the correlation between the two tested variables.)

Now red versus green for negative words.

t_neat(subjects_mx$rt_green_negative,
       subjects_mx$rt_red_negative,
       pair = TRUE)

Correlation: r(85) = .454, 95% CI [.269, .606], p < .001.
Descriptives: M±SD = 613.31±48.62 vs. 574.20±46.57 (raw mean difference: –39.11, 95% CI [28.50, 49.72])
t(86) = 7.33, p < .001, d = 0.79, 95% CI [0.54, 1.02], BF10 = 8.06 × 10^7.

Both significant. All left to do is print a nice table to show means and SDs as customary.

table_neat(
    list(
        aggr_neat(subjects_merged, rt_green_negative),
        aggr_neat(subjects_merged, rt_green_positive),
        aggr_neat(subjects_merged, rt_red_negative),
        aggr_neat(subjects_merged, rt_red_positive),
        aggr_neat(subjects_merged, er_green_negative),
        aggr_neat(subjects_merged, er_green_positive),
        aggr_neat(subjects_merged, er_red_negative),
        aggr_neat(subjects_merged, er_red_positive)
    ),
    group_by = 'condition'
)

This will produce a table as follows. (Well, this table here is formatted with Markdown notation, but the names and numbers are verbatim.)

aggr_group	rt_green_negative	rt_green_positive	rt_red_negative	rt_red_positive	er_green_negative	er_green_positive	er_red_negative	er_red_positive
mixed	612.79±48.00	537.93±50.81	574.11±46.99	574.40±56.28	0.14±0.08	0.09±0.06	0.15±0.08	0.15±0.07
separate	564.45±57.23	522.21±54.49	556.85±55.88	523.64±49.57	0.14±0.07	0.10±0.07	0.16±0.09	0.10±0.05

This is not so nice though: let’s modify rounding of RTs to zero, and convert error rates to percentages. (For the latter, you need to use a vector input, so in this case the original column values, e.g. subjects_merged$er_green_negative, multiplied manually by 100.)

Also, here I set to_clipboard = TRUE, which puts the table on your clipboard with plain format. This you can copy into Excel, and from that to Word, and you have the table. (Unfortunately Word doesn’t produce nice table when copied there directly.)

(Note: you can also add new names for each column by the aggr_neat’s new_name parameter, but I didn’t; it may be clearer to keep the original names here, and just rename them in the final table in Word.)

table_neat(
    list(
        aggr_neat(subjects_merged, rt_green_negative, round_to = 0),
        aggr_neat(subjects_merged, rt_green_positive, round_to = 0),
        aggr_neat(subjects_merged, rt_red_negative, round_to = 0),
        aggr_neat(subjects_merged, rt_red_positive, round_to = 0),
        aggr_neat(subjects_merged, subjects_merged$er_green_negative * 100),
        aggr_neat(subjects_merged, subjects_merged$er_green_positive * 100),
        aggr_neat(subjects_merged, subjects_merged$er_red_negative * 100),
        aggr_neat(subjects_merged, subjects_merged$er_red_positive * 100)
    ),
    group_by = 'condition',
    to_clipboard = TRUE
)

aggr_group	rt_green_negative	rt_green_positive	rt_red_negative	rt_red_positive	data_final$er_green_negative * 100	data_final$er_green_positive * 100	data_final$er_red_negative * 100	data_final$er_red_positive * 100
mixed	613±49	539±51	574±47	575±57	13.15±7.47	8.69±6.15	14.46±7.78	14.87±7.06
separate	564±58	522±55	556±56	523±50	14.09±6.82	9.35±6.94	15.94±9.32	10.22±5.52

That’s all

You have everything you need to report from this (hypothetical) experiment. This is of course a single specific case, but I think it is fairly easy to generalize to most typical designs used in psychological science.