Map a function by condition across simulation output chunks
Source:R/simulation_output.R
map_by_condition.RdThis function processes simulation output by gathering all chunks, iterating through them one by one, filtering and collecting data by chunk, then applying a user-defined function by condition within each chunk.
Usage
map_by_condition(
simulation_output,
.f,
...,
.combine = dplyr::bind_rows,
.parallel = NULL,
.n_cores = NULL,
.progress = FALSE
)Arguments
- simulation_output
A eam_simulation_output object containing the dataset and configuration
- .f
A function to apply to each condition's data. The function should accept a data frame representing one condition's results
- ...
Additional arguments passed to the function .f
- .combine
Function to combine results (default: dplyr::bind_rows)
- .parallel
Logical or NULL.
- .n_cores
Integer. Number of CPU cores to use for parallel processing. If NULL, uses
detectCores() - 1. Only used when.parallel = TRUE.- .progress
Logical, whether to show a progress bar (default: FALSE)
Value
A list containing the results of applying .f to each condition, with names corresponding to condition indices
Details
This function handles out-of-core computation automatically using Apache Arrow, so you don't need to understand Arrow internals. It loads data chunk by chunk to avoid memory issues with large simulations.
If you prefer to manually work with the raw Arrow dataset, you can access it
via simulation_output$open_dataset(), which returns an Arrow Dataset
object. You can then use dplyr verbs to filter and query before calling
dplyr::collect() to load data into memory.
Examples
# Load simulation output
sim_output_path <- system.file(
"extdata", "rdm_minimal", "simulation",
package = "eam"
)
sim_output <- load_simulation_output(sim_output_path)
# Define a summary pipeline
summary_pipe <- summarise_by(
.by = c("condition_idx"),
rt_mean = mean(rt),
rt_quantiles = quantile(rt, probs = c(0.1, 0.5, 0.9))
)
# Apply function to each condition
sim_sumstat <- map_by_condition(
sim_output,
.progress = FALSE,
.parallel = FALSE,
function(cond_df) {
summary_pipe(cond_df)
}
)