Title: | Preparing Experimental Data for Statistical Analysis |
---|---|
Description: | Prepares data for statistical analysis (e.g., analysis of variance ;ANOVA) by enabling the user to easily and quickly merge (using the file_merge() function) raw data files into one merged table and then aggregate the merged table (using the prep() function) into a finalized table while keeping track and summarizing every step of the preparation. The finalized table contains several possibilities for dependent measures of the dependent variable. Most suitable when measuring variables in an interval or ratio scale (e.g., reaction-times) and/or discrete values such as accuracy. Main functions included are file_merge() and prep(). The file_merge() function vertically merges individual data files (in a long format) in which each line is a single observation to one single dataset. The prep() function aggregates the single dataset according to any combination of grouping variables (i.e., between-subjects and within-subjects independent variables, respectively), and returns a data frame with a number of dependent measures for further analysis for each cell according to the combination of provided grouping variables. Dependent measures for each cell include among others means before and after rejecting all values according to a flexible standard deviation criteria, number of rejected values according to the flexible standard deviation criteria, proportions of rejected values according to the flexible standard deviation criteria, number of values before rejection, means after rejecting values according to procedures described in Van Selst & Jolicoeur (1994; suitable when measuring reaction-times), standard deviations, medians, means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and harmonic means. The data frame prep() returns can also be exported as a txt file to be used for statistical analysis in other statistical programs. |
Authors: | Ayala S. Allon [aut, cre], Roy Luria [aut], James Grange [ctb], Nachshon Meiran [ctb] |
Maintainer: | Ayala S. Allon <[email protected]> |
License: | GPL-3 |
Version: | 1.0.8 |
Built: | 2025-01-09 05:10:54 UTC |
Source: | https://github.com/ayalaallon/prepdat |
Vertically concatenates files containing data tables in a long format into a single large dataset. In order for the function to work, all files you wish to merge should be in the same format (either txt or csv). This function is very useful for concatenating raw data files of individual subjects in an experiment (in which each line corresponds to a single observation in the experiment) to one raw data file that includes all subjects.
file_merge( folder_path = NULL , has_header = TRUE , new_header = c() , raw_file_name = NULL , raw_file_extension = NULL , file_name = "dataset.txt" , save_table = TRUE , dir_save_table = NULL , notification = TRUE )
file_merge( folder_path = NULL , has_header = TRUE , new_header = c() , raw_file_name = NULL , raw_file_extension = NULL , file_name = "dataset.txt" , save_table = TRUE , dir_save_table = NULL , notification = TRUE )
folder_path |
A string with the path of the folder in which files to be
merged are searched. Search is recursive (i.e., can search also in
subdirectories). |
has_header |
Logical. If |
new_header |
String vector with names for columns of the merged table.
Default is |
raw_file_name |
A string with the name of the files to be searched
and then merged. File extension should NOT be included here (see
|
raw_file_extension |
A string with the format of the files (i.e.,
|
file_name |
A string with the name of the file of the merged table the
function creates in case |
save_table |
Logical. If |
dir_save_table |
A string with the path of the folder in which the
merged table is saved in case |
notification |
Logical. If |
The merged table
prepdat::prep()
returns for stroopdata
According to the
Example in prepdat::prep()
.A data frame containing dependent measures prep
for each id
calculated according to grouping variables: block and target_type.
prep()
aggregates the columns for the dependent measures by first
dividing them to the levels of the first independent variable in
wthin vars
, and then within each level prep()
divides the columns
according to the next variable in within_vars
and so forth.
Thus, for each dependent measure in this example there are four columns
according to the order they where entered in within_vars
argument in
prep
. For this data frame this argument was
within_vars = c("block", "target_type")
.
data(finalized_stroopdata)
data(finalized_stroopdata)
A data frame with 15 rows and 98 columns.
The complete list of names of the dependent measures is:
mdvc
: mean dvc
.
sdvc
: SD for dvc
.
meddvc
: median dvc
.
tdvc
: mean dvc
after rejecting observations above
standard deviation criteria specified in sd_criterion
.
ntr
: number of observations rejected for each standard deviation
criterion specified in sd_criterion
.
ndvc
: number of observations before rejection.
ptr
: proportion of observations rejected for each standard
deviation criterion specified in sd_criterion
.
rminv
: harmonic mean of dvc
.
prt
: dvc
according to each of the percentiles specified
in percentiles
.
mdvd
: mean dvd
.
merr
: mean error.
nrmc
: mean dvc
according to non-recursive procedure with
moving criterion.
nnrmc
: number of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
pnrmc
: percent of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
tnrmc
: total number of observations upon which the non-recursive
procedure with moving criterion was applied.
mrmc
: mean dvc
according to modified-recursive procedure
with moving criterion.
nmrmc
: number of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
pmrmc
: percent of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
tmrmc
: total number of observations upon which the
modified-recursive procedure with moving criterion was applied.
hrmc
: mean dvc
according to hybrid-recursive procedure
with moving criterion.
nhrmc
: number of observations rejected for dvc
according
to hybrid-recursive procedure with moving criterion.
thrmc
: total number of observations upon which the
hybrid-recursive procedure with moving criterion was applied.
data(finalized_stroopdata) head(finalized_stroopdata)
data(finalized_stroopdata) head(finalized_stroopdata)
Hybrid-recursive outlier removal procedure with moving criterion according to Van Selst & Jolicoeur (1994).
hybrid_recursive_mc(exp_cell)
hybrid_recursive_mc(exp_cell)
exp_cell |
Numeric vector on which the outlier removal method takes
place. If experimental cell has 4 trials or less it will result in
|
A vector with the mean of exp_cell
after removing outliers,
percent of trials removed, and total number of trials in exp_cell
before
outlier removal.
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.0. https://cran.r-project.org/package=trimr
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
Modified-recursive outlier removal procedure with moving criterion according to Van Selst & Jolicoeur (1994).
modified_recursive_mc(exp_cell)
modified_recursive_mc(exp_cell)
exp_cell |
Numeric vector on which the outlier removal method takes
place. If experimental cell has 4 trials or less it will result in
|
A vector with the mean of exp_cell
after removing outliers,
percent of trials removed, number of trials removed in the procedure,and
total number of trials in exp_cell
before outlier removal.
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.0. https://cran.r-project.org/package=trimr
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
Non-recursive outlier removal procedure with moving criterion according to Van Selst & Jolicoeur (1994).
non_recursive_mc(exp_cell)
non_recursive_mc(exp_cell)
exp_cell |
Numeric vector on which the outlier removal method takes
place. If experimental cell has 4 trials or less it will result in
|
A vector with the mean of exp_cell
after removing outliers,
percent of trials removed, number of trials removed in the procedure,and
total number of trials in exp_cell
before outlier removal.
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.0. https://cran.r-project.org/package=trimr
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
prep()
aggregates a single dataset in a long format
according to any number of grouping variables. This makes prep()
suitable for aggregating data from various types of experimental designs
such as between-subjects, within-subjects (i.e., repeated measures), and
mixed designs (i.e., experimental designs that include both between- and
within- subjects independent variables). prep()
returns a data
frame with a number of dependent measures for further analysis for each
aggregated cell (i.e., experimental cell) according to the provided
grouping variables (i.e., independent variables). Dependent measures for
each experimental cell include among others means before and after
rejecting observations according to a flexible standard deviation
criteria, number of rejected observations according to the flexible
standard deviation criteria, proportions of rejected observations
according to the flexible standard deviation criteria, number of
observations before rejection, means after rejecting observations
according to procedures described in Van Selst & Jolicoeur (1994;
suitable when measuring reaction-times), standard deviations, medians,
means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and
harmonic means. The data frame prep()
returns can also be exported
as a txt or csv file to be used for statistical analysis in other
statistical programs.
prep( dataset = NULL , file_name = NULL , file_path = NULL , id = NULL , within_vars = c() , between_vars = c() , dvc = NULL , dvd = NULL , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = NULL , keep_trials_dvd = NULL , id_properties = c() , sd_criterion = c(1, 1.5, 2) , percentiles = c(0.05, 0.25, 0.75, 0.95) , outlier_removal = NULL , keep_trials_outlier = NULL , decimal_places = 4 , notification = TRUE , dm = c() , save_results = TRUE , results_name = "results.txt" , results_path = NULL , save_summary = TRUE )
prep( dataset = NULL , file_name = NULL , file_path = NULL , id = NULL , within_vars = c() , between_vars = c() , dvc = NULL , dvd = NULL , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = NULL , keep_trials_dvd = NULL , id_properties = c() , sd_criterion = c(1, 1.5, 2) , percentiles = c(0.05, 0.25, 0.75, 0.95) , outlier_removal = NULL , keep_trials_outlier = NULL , decimal_places = 4 , notification = TRUE , dm = c() , save_results = TRUE , results_name = "results.txt" , results_path = NULL , save_summary = TRUE )
dataset |
Name of the data frame in R that contains the long format
table after merging the individual data files using
|
file_name |
A string with the name of a txt or csv file (including the
file extension, e.g. |
file_path |
A string with the path of the folder in which
|
id |
A string with the name of the column in |
within_vars |
String vector with names of grouping variables in
|
between_vars |
String vector with names of grouping variables in
|
dvc |
A string with the name of the column in |
dvd |
A string with the name of the column in |
keep_trials |
A string. Allows deleting unnecessary observations and
keeping necessary observations in |
drop_vars |
String vector with names of columns to delete in |
keep_trials_dvc |
A string. Allows deleting unnecessary observations
and keeping necessary observations in |
keep_trials_dvd |
A string. Allows deleting unnecessary observations
and keeping necessary observations in |
id_properties |
String vector with names of columns in |
sd_criterion |
Numeric vector specifying a number of standard deviation
criteria for which |
percentiles |
Numeric vector containing wanted percentiles for |
outlier_removal |
Numeric. Specifies which outlier removal procedure
with moving criterion to calculate for |
keep_trials_outlier |
A string. Allows deleting unnecessary
observations and keeping necessary observations in |
decimal_places |
Numeric. Specifies number of decimals to be written
in |
notification |
Logical. If |
dm |
String vector with names of dependent measures the function
returns. If empty (i.e., |
save_results |
Logical. If TRUE, the function creates a txt file
containing the returned data frame. Default is |
results_name |
A string with the name of the file |
results_path |
A string with the path of the folder in which
|
save_summary |
Logical. if |
A data frame with dependent measures for the dependent variables in
dvc
and dvd
by id
and grouping variables.
The first column in the finalized table is the id
column.
In case id_properties
was used, the next columns will be the
value of each id_properties
for each id
.
If between_vars
was used then the next column{}s will be the value
of each beween_vars
for each id
.
The next columns of the finalized table contain the dependent measures
according to the design specified. If within_vars
was used, then the
data for each dependent measure was first divided according to the levels
of the first grouping variable in witin_vars()
, and then within each
of those levels prep()
divided the data according to the next
variable in within_vars()
and so forth.
The dependent measures in the finalized table are:
mdvc
: mean dvc
.
sdvc
: SD for dvc
.
meddvc
: median dvc
.
tdvc
: mean dvc
after rejecting observations above
standard deviation criteria specified in sd_criterion
.
ntr
: number of observations rejected for each standard deviation
criterion specified in sd_criterion
.
ndvc
: number of observations before rejection.
ptr
: proportion of observations rejected for each standard
deviation criterion specified in sd_criterion
.
rminv
: harmonic mean of dvc
.
prt
: dvc
according to each of the percentiles specified
in percentiles
.
mdvd
: mean dvd
.
merr
: mean error.
nrmc
: mean dvc
according to non-recursive procedure with
moving criterion.
nnrmc
: number of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
pnrmc
: percent of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
tnrmc
: total number of observations upon which the non-recursive
procedure with moving criterion was applied.
mrmc
: mean dvc
according to modified-recursive procedure
with moving criterion.
nmrmc
: number of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
pmrmc
: percent of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
tmrmc
: total number of observations upon which the
modified-recursive procedure with moving criterion was applied.
hrmc
: mean dvc
according to hybrid-recursive procedure
with moving criterion.
nhrmc
: number of observations rejected for dvc
according
to hybrid-recursive procedure with moving criterion.
thrmc
: total number of observations upon which the
hybrid-recursive procedure with moving criterion was applied.
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.1. https://CRAN.R-project.org/package=trimr
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
data(stroopdata) finalized_stroopdata <- prep( dataset = stroopdata , file_name = NULL , file_path = NULL , id = "subject" , within_vars = c("block", "target_type") , between_vars = c("order") , dvc = "rt" , dvd = "ac" , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = "raw_data$rt > 100 & raw_data$rt < 3000 & raw_data$ac == 1" , keep_trials_dvd = "raw_data$rt > 100 & raw_data$rt < 3000" , id_properties = c() , sd_criterion = c(1, 1.5, 2) , percentiles = c(0.05, 0.25, 0.75, 0.95) , outlier_removal = 2 , keep_trials_outlier = "raw_data$ac == 1" , decimal_places = 0 , notification = TRUE , dm = c() , save_results = FALSE , results_name = "results.txt" , results_path = NULL , save_summary = FALSE )
data(stroopdata) finalized_stroopdata <- prep( dataset = stroopdata , file_name = NULL , file_path = NULL , id = "subject" , within_vars = c("block", "target_type") , between_vars = c("order") , dvc = "rt" , dvd = "ac" , keep_trials = NULL , drop_vars = c() , keep_trials_dvc = "raw_data$rt > 100 & raw_data$rt < 3000 & raw_data$ac == 1" , keep_trials_dvd = "raw_data$rt > 100 & raw_data$rt < 3000" , id_properties = c() , sd_criterion = c(1, 1.5, 2) , percentiles = c(0.05, 0.25, 0.75, 0.95) , outlier_removal = 2 , keep_trials_outlier = "raw_data$ac == 1" , decimal_places = 0 , notification = TRUE , dm = c() , save_results = FALSE , results_name = "results.txt" , results_path = NULL , save_summary = FALSE )
Reads a File in a txt or csv Format that Contains a Table and Creates a Data Frame from it
read_data(file_name, file_path = NULL, notification = TRUE)
read_data(file_name, file_path = NULL, notification = TRUE)
file_name |
A string with the name of the file to be read into R. The string should include the file extension. |
file_path |
A string with the path to the folder in which the file to
read is located. Default is |
notification |
Logical. If |
A data frame of the table specified in file_name
.
A dataset containing reaction-times, accuracy, and other attributes of 5400 experimental trials.
data(stroopdata)
data(stroopdata)
A data frame with 5401 rows and 10 columns:
Case identifier, in numerals
Percent of congruent target_type trials in a block. 1 means 80 percent congruent, 2 means 20 percent congruent
Age of subject, in integers
Gender of subject, in integers. 1 means male, 2 means female
Order of blocks, in integers. 1 means subject did 80 percent congruent block first and 20 percent congruent block second. 2 means subject did 20 percent congruent block first and 80 percent congruent block second.
Font size of the stimulus, in integers
Trial number, in integers
Type of stimulus for a given trial. 1 means congruent stimulus, 2 means incongruent stimulus
Reaction time, in milliseconds
Accuracy, 1 means correct, 0 means incorrect
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of experimental psychology, 18(6), 643.
data(stroopdata) head(stroopdata)
data(stroopdata) head(stroopdata)