Statistical analysis of random walk step distributions across grid sizes and walker counts using targets pipeline
Overview
This vignette uses the targets package to analyze how the distribution of steps per walker varies across different conditions:
Grid size: How does grid size affect path lengths?
Walker order: Do early vs late walkers differ?
Simulation progress: Does behavior change as the simulation progresses?
Termination reason: How do paths differ by termination type?
We use targets for reproducible, cacheable analysis with dynamic branching.
Load Packages
Show code
# Load development version of randomwalkif (!requireNamespace("randomwalk", quietly =TRUE)) { devtools::load_all(".") # Use current directory, don't need 'here' package}library(randomwalk)library(ggplot2)library(dplyr)library(tidyr)library(purrr)library(DT) # For interactive sortable tables
Targets Pipeline
Define Analysis Functions
Show code
#' Extract step data from simulation result#'#' @param result Simulation result from run_simulation()#' @return Data frame with walker-level step informationextract_step_data <-function(result) { walkers <- result$walkers grid_size <- result$parameters$grid_size# Extract data for each walker data <-map_dfr(seq_along(walkers), function(i) { walker <- walkers[[i]]tibble(walker_id = i,walker_order = i, # Order in which walker was createdsteps = walker$steps,termination_reason = walker$termination_reason %||%"unknown",grid_size = grid_size,# Calculate percentile of walker in simulationwalker_percentile = i /length(walkers) *100 ) }) data}#' Analyze step distributions by termination reason#'#' @param step_data Data frame from extract_step_data()#' @return Summary statistics by termination reason and grid sizeanalyze_by_termination <-function(step_data) { step_data %>%group_by(grid_size, termination_reason) %>%summarise(n_walkers =n(),mean_steps =mean(steps),median_steps =median(steps),sd_steps =sd(steps),min_steps =min(steps),max_steps =max(steps),q25_steps =quantile(steps, 0.25),q75_steps =quantile(steps, 0.75),.groups ="drop" )}#' Analyze step distributions by walker order#'#' @param step_data Data frame from extract_step_data()#' @param n_groups Number of percentile groups (default 10 for deciles)#' @return Summary statistics by walker percentile binsanalyze_by_walker_order <-function(step_data, n_groups =10) {# Create percentile breaks (e.g., 10 groups = deciles: 0-10%, 10-20%, ..., 90-100%) breaks <-seq(0, 100, length.out = n_groups +1)# Create labels for groups labels <-sapply(seq_len(n_groups), function(i) {sprintf("%d-%d%%", breaks[i], breaks[i +1]) }) step_data %>%mutate(walker_group =cut( walker_percentile,breaks = breaks,labels = labels,include.lowest =TRUE ) ) %>%group_by(grid_size, walker_group, termination_reason) %>%summarise(n_walkers =n(),mean_steps =mean(steps),median_steps =median(steps),.groups ="drop" )}#' Create visualization of step distributions#'#' @param step_data Data frame from extract_step_data()#' @return ggplot objectplot_step_distributions <-function(step_data) {ggplot(step_data, aes(x = steps, fill = termination_reason)) +geom_histogram(bins =30, alpha =0.7, position ="identity") +facet_wrap(~grid_size, scales ="free_y", ncol =2) +scale_fill_brewer(palette ="Set2", name ="Termination") +labs(title ="Distribution of Steps per Walker",subtitle ="By grid size and termination reason",x ="Number of Steps",y ="Count",caption ="Histogram shows frequency distribution of walker path lengths. Larger grids show higher step counts." ) +theme_minimal() +theme(legend.position ="bottom")}#' Create visualization of steps vs walker order#'#' @param step_data Data frame from extract_step_data()#' @return ggplot objectplot_steps_by_order <-function(step_data) {ggplot(step_data, aes(x = walker_order, y = steps, color = termination_reason)) +geom_point(alpha =0.5, size =2) +geom_smooth(method ="loess", se =TRUE, alpha =0.2) +facet_wrap(~grid_size, scales ="free", ncol =2) +scale_color_brewer(palette ="Set2", name ="Termination") +labs(title ="Steps vs Walker Order Percentile",subtitle ="Do early vs late walkers behave differently?",x ="Walker Order Percentile (e.g., 50% = 50th walker out of 100)",y ="Number of Steps",caption ="Each point represents one walker. For 100 walkers, 50% corresponds to the 50th walker created." ) +theme_minimal() +theme(legend.position ="bottom")}#' Create visualization of average steps by walker percentile#'#' @param summary_data Summary data from analyze_by_walker_order()#' @return ggplot objectplot_steps_by_percentile <-function(summary_data) {ggplot(summary_data, aes(x = walker_group, y = mean_steps, fill = termination_reason)) +geom_col(position ="dodge", alpha =0.8) +geom_errorbar(aes(ymin = mean_steps -10, ymax = mean_steps +10),position =position_dodge(0.9),width =0.2,alpha =0.5 ) +facet_wrap(~grid_size, scales ="free_y", ncol =2) +scale_fill_brewer(palette ="Set2", name ="Termination") +labs(title ="Average Steps by Walker Cohort (Deciles)",subtitle ="Comparing walker performance across creation order percentiles",x ="Walker Percentile Group",y ="Mean Steps",caption ="Default shows 10 groups (deciles). Each bar represents walkers in that percentile range." ) +theme_minimal() +theme(legend.position ="bottom",axis.text.x =element_text(angle =45, hjust =1) )}
Define Targets Plan
Show code
# This would go in _targets.R filelibrary(targets)library(tarchetypes)# Set up parallel processing if availabletar_option_set(packages =c("randomwalk", "dplyr", "ggplot2", "purrr", "tidyr"),format ="rds")# Define grid sizes to analyzegrid_sizes <-c(20, 40, 60, 80)list(# Dynamic branching: Run simulations for each grid sizetar_target(name = grid_size_values,command = grid_sizes ),tar_target(name = simulation_results,command =run_simulation(grid_size = grid_size_values,n_walkers =200,neighborhood ="4-hood",boundary ="terminate",max_steps =5000,workers =0,verbose =FALSE ),pattern =map(grid_size_values),iteration ="list" ),# Extract step data from each simulationtar_target(name = step_data_by_grid,command =extract_step_data(simulation_results),pattern =map(simulation_results),iteration ="list" ),# Combine all step datatar_target(name = step_data_combined,command =bind_rows(step_data_by_grid) ),# Analysis: By termination reasontar_target(name = termination_summary,command =analyze_by_termination(step_data_combined) ),# Analysis: By walker ordertar_target(name = walker_order_summary,command =analyze_by_walker_order(step_data_combined) ),# Visualizationstar_target(name = plot_distributions,command =plot_step_distributions(step_data_combined) ),tar_target(name = plot_by_order,command =plot_steps_by_order(step_data_combined) ),tar_target(name = plot_by_percentile,command =plot_steps_by_percentile(walker_order_summary) ))
Run Analysis
Show code
# Load pre-computed simulations from targets# These are computed in _targets.R with set.seed(123) for reproducibilitylibrary(targets)simulation_results <-tar_read(step_dist_sims)grid_sizes <-tar_read(step_dist_grid_sizes)
Overall step statistics by grid size (with normalized columns)
grid_size
overall_mean
overall_median
overall_mean_per_grid
overall_median_per_grid
20
19.62
14.78
0.98
0.74
40
141.00
100.22
3.52
2.51
60
373.67
267.48
6.23
4.46
Scaling: Mean steps scales roughly linearly with grid size
Variance: Larger grids show higher variance in path lengths
Walker Order Effects
Show code
order_effect_summary <- walker_order_summary %>%group_by(walker_group) %>%summarise(avg_mean_steps =mean(mean_steps),avg_median_steps =mean(median_steps),.groups ="drop" )order_effect_summary %>% knitr::kable(digits =1,caption ="Step statistics by walker cohort (averaged across grid sizes and termination reasons)" )
Step statistics by walker cohort (averaged across grid sizes and termination reasons)
walker_group
avg_mean_steps
avg_median_steps
0-10%
247.8
233.1
10-20%
107.0
101.3
20-30%
116.4
75.3
30-40%
161.5
120.4
40-50%
215.2
163.1
50-60%
157.5
153.1
60-70%
220.8
180.8
70-80%
170.1
175.4
80-90%
171.3
183.2
90-100%
155.8
121.5
Show code
# Calculate actual percentage difference between first and last decilesfirst_decile_mean <- order_effect_summary$avg_mean_steps[1]last_decile_mean <- order_effect_summary$avg_mean_steps[nrow(order_effect_summary)]pct_diff <-round((first_decile_mean - last_decile_mean) / last_decile_mean *100, 0)
Walker order effect: First decile (0-10%) averages 247.8 steps vs last decile (90-100%) with 155.8 steps (~59% difference)
Trend: Generally shows declining path lengths from early to late walkers, though not perfectly monotonic across all deciles
Mechanism: Black pixel accumulation restricts movement options for later walkers
Termination Reason Effects
Note: Statistics aggregated across ALL grid sizes (20×20, 40×40, 60×60).
Show code
# Compare termination reasons across all grid sizes and walkerstermination_effect_summary <- termination_summary %>%group_by(termination_reason) %>%summarise(total_walkers =sum(n_walkers),avg_steps =weighted.mean(mean_steps, n_walkers),avg_sd =weighted.mean(sd_steps, n_walkers),.groups ="drop" ) %>%mutate(proportion = total_walkers /sum(total_walkers) )termination_effect_summary %>% knitr::kable(digits =3,caption ="Overall statistics by termination reason (across all grid sizes)" )
Overall statistics by termination reason (across all grid sizes)
termination_reason
total_walkers
avg_steps
avg_sd
proportion
black_neighbor
91
175.604
151.448
0.303
hit_boundary
204
182.691
167.510
0.680
touched_black
5
36.000
33.005
0.017
Boundary terminations: Most common (68% of walkers), moderate path lengths
Black neighbor terminations: Less common (30% of walkers), varying path lengths
Max steps terminations: Walkers that exceeded safety limit without finding termination condition
Implication: Most walkers encounter grid boundaries before accumulating enough black pixels to create connected patterns
Conclusion
This vignette demonstrates how step distributions vary across different simulation conditions using decile-based analysis (10 percentile groups) for fine-grained insights. The key findings are:
Grid size scaling: Mean steps scales roughly linearly with grid size (normalized columns show steps/grid_size ratios)
Walker order matters: Early walkers (first decile) explore significantly longer paths than late walkers (last decile)
Termination dominance: Boundaries terminate most walkers, reflecting sparse grid occupancy
Decile-level trends: Analysis reveals nuanced patterns across 10 percentile groups, showing gradual progression rather than sharp quartile boundaries
Black pixel accumulation: Later walkers face increasing restrictions from accumulated black pixels, though effect varies by grid size and termination type
Interactive features: Sortable DataTables allow exploration of results by clicking column headers. Normalized columns (steps/grid_size) facilitate cross-grid comparisons.
These patterns provide insights into the dynamics of sequential random walk simulations and how grid occupancy affects walker behavior.