This vignette explains how to compute statistics for relational
event history data using remstats(), for both tie-oriented
and actor-oriented models. When using the package please cite:
Meijerink-Bosman, M., Back, M., Geukes, K., Leenders, R., & Mulder, J. (2023). Discovering trends of social interaction behavior over time: An introduction to relational event modeling. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01821-8
Relational event modeling approaches enable researchers to
investigate both exogenous and endogenous factors influencing the
evolution of a time-ordered sequence of relational events — commonly
known as a relational event history. These models are
categorized into tie-oriented models, where the probability of
a dyad interacting next is modeled in a single step (e.g., see Butts,
2008), and actor-oriented models, which first model the
probability of a sender initiating an interaction and subsequently the
probability of the sender’s choice of receiver (e.g., see Stadtfeld
& Block, 2017). The R package remstats is
designed to compute a variety of statistics for both types of
models.
The remstats package is part of the
remverse suite of R packages developed at
Tilburg University to support applied researchers in relational event
modeling. For preparing the relational event history,
remstats requires prior processing with
remify::remify(). Model estimation can subsequently be
executed using remstimate.
# Load data
data(history)
data(info)
# Set weights to 1 (no event weighting)
history$weight <- 1
# Define effects
effects <- ~ 1 + send("extraversion", info) + inertia()
# Prepare event history
reh <- remify::remify(edgelist = history, model = "tie")
# Compute statistics
stats <- remstats(reh = reh, tie_effects = effects)
# Estimate model parameters (requires remstimate)
# fit <- remstimate::remstimate(reh = reh, stats = stats, method = "MLE")Relational event history data describes a time-ordered series of interactions between actors in a network. A relational event minimally contains information on the time of the event and the actors involved.
We use the history dataset from the
remstats package (see ?history): a small
simulated relational event history with 115 events among 10 actors. Each
event also records a setting (either "work" or
"social") and an event weight.
head(history)
#> time actor1 actor2 setting weight
#> 1 238 105 113 work 1
#> 2 317 105 109 work 1
#> 3 345 115 112 work 1
#> 4 627 101 115 social 1
#> 5 832 113 107 social 1
#> 6 842 105 109 work 1Exogenous information on actors is stored in the info
object (see ?info), containing age, sex, extraversion, and
agreeableness scores measured at multiple time points for each of the 10
actors:
head(info)
#> name time age sex extraversion agreeableness
#> 1 101 0 0 0 -0.40 -0.14
#> 2 101 9432 0 0 -0.32 -0.26
#> 3 101 18864 0 0 -0.53 -0.45
#> 4 103 0 0 0 -0.13 -0.65
#> 5 103 9432 0 0 -0.43 -0.44
#> 6 103 18864 0 0 -0.13 -0.43We prepare the event history for the tie-oriented model with
remify::remify(). When a weight column is
present in the edgelist, it is used to weight events in the computation
of statistics. Here we set weights to one to avoid this:
Statistics for the tie-oriented model are supplied to the
tie_effects argument of remstats() as a
formula object of the form ~ terms. An
overview of available statistics is obtained with
tie_effects() or ?tie_effects.
As a first example, we request the standardized inertia statistic:
The output is a 3-dimensional array. Rows correspond to unique time points, columns to potential events in the risk set, and slices to statistics:
The number of rows equals reh$M — the number of unique
time points (115 here). When simultaneous events are present, the number
of rows is less than the total number of events, since statistics are
computed once per unique time point. There are 90 columns corresponding
to the \(10 \times 9\) directed dyads
in the full risk set. The two slices are: (1) a baseline statistic,
automatically included when event timing is exact
(ordinal = FALSE), and (2) the inertia statistic. The
baseline can be suppressed by including -1 in the effects
formula.
out
#> Relational Event Network Statistics
#> > Model: tie-oriented
#> > Computation method: per time point
#> > Dimensions: 114 time points x 90 dyads x 2 statistics
#> > Statistics:
#> >> 1: baseline
#> >> 2: inertiaThe risk set composition is stored as an attribute of the output.
Each row describes one dyad, with columns sender,
receiver, type (if event types are present),
and id (the column index in the statistics array):
head(attr(out, "riskset"))
#> sender receiver id
#> 1 101 103 1
#> 2 101 104 2
#> 3 101 105 3
#> 4 101 107 4
#> 5 101 109 5
#> 6 101 111 6Exogenous statistics require an attr_actors object. The
attr_actors can be passed directly inside effect functions
or globally via the attr_actors argument of
remstats(). Statistics are separated by + in
the formula, and interactions are specified with : or
*:
effects <- ~ inertia(scaling = "std") +
send("extraversion", info) +
inertia(scaling = "std"):send("extraversion", info)
out <- remstats(tie_effects = effects, reh = reh)
out
#> Relational Event Network Statistics
#> > Model: tie-oriented
#> > Computation method: per time point
#> > Dimensions: 114 time points x 90 dyads x 4 statistics
#> > Statistics:
#> >> 1: baseline
#> >> 2: inertia
#> >> 3: send_extraversion
#> >> 4: inertia:send_extraversionThe memory argument controls which past events are
included in the computation of endogenous statistics at each time point
\(t\). Four options are available:
"full" (default): the entire event history before \(t\) is included."window": only events within a time window of length
memory_value before \(t\)
are included. For example,
memory = "window", memory_value = 200 includes only events
that occurred at most 200 time units ago."interval": only events within a specific time interval
before \(t\) are included, defined by
memory_value = c(lower, upper). For example,
memory_value = c(100, 200) includes only events between 100
and 200 time units ago."decay": past events are weighted by an exponential
decay function with half-life memory_value. More recent
events receive higher weight.# Full memory (default)
out_full <- remstats(
tie_effects = ~ inertia(),
reh = reh,
memory = "full"
)
# Window memory: only events within the last 500 time units
out_window <- remstats(
tie_effects = ~ inertia(),
reh = reh,
memory = "window",
memory_value = 500
)
# Decay memory: exponential decay with half-life 200
out_decay <- remstats(
tie_effects = ~ inertia(),
reh = reh,
memory = "decay",
memory_value = 200
)consider_typeWhen an event type variable is present in the relational event
history, the consider_type argument in each effect function
controls how the statistic accounts for event types. We illustrate this
using the setting column in history (values:
"work", "social"), passed to
remify() via the event_type argument:
Three options are available for consider_type:
"ignore": event type is not taken into
account. The statistic is computed over all past events regardless of
type. This is the appropriate choice when event types are not a
substantive concern, or when no types are present.
out_ignore <- remstats(
tie_effects = ~ inertia(consider_type = "ignore"),
reh = reh_typed
)
dim(out_ignore) # one inertia slice plus baseline
#> [1] 114 90 2
dimnames(out_ignore)[[3]]
#> [1] "baseline" "inertia""separate": the statistic is computed
separately for each event type, resulting in one slice per type. For
example, with two types "social" and "work",
inertia yields: (1) the number of past "social" events for
each dyad, and (2) the number of past "work" events for
each dyad.
out_separate <- remstats(
tie_effects = ~ inertia(consider_type = "separate"),
reh = reh_typed
)
dim(out_separate) # two inertia slices (one per type) plus baseline
#> [1] 114 90 3
dimnames(out_separate)[[3]]
#> [1] "baseline" "inertia.social" "inertia.work""interact": the statistic conditions on
the type of the most recent event. The statistic at time \(t\) is computed only over past events of
the same type as the event at \(t-1\),
effectively allowing the effect of past event history to depend on the
type of the triggering event.
out_interact <- remstats(
tie_effects = ~ inertia(consider_type = "interact"),
reh = reh_typed
)
#> Warning in prepare_tomstats(effects = effects, reh = reh, attr_actors =
#> attr_actors, : "interact" requires extend_riskset_by_type = TRUE; coercing to
#> "separate".
dim(out_interact)
#> [1] 114 90 3
dimnames(out_interact)[[3]]
#> [1] "baseline" "inertia.social" "inertia.work"For large networks where computing statistics over the full risk set
is computationally expensive, remstats supports
case-control sampling (also known as dyad sampling). Instead of
computing statistics for all dyads in the risk set at each time point, a
random sample of dyads is drawn and used as the comparison set,
substantially reducing computation time.
Case-control sampling is enabled by setting
sampling = TRUE in remstats(). Two additional
arguments control the sampling:
samp_num: the number of dyads to sample per event (must
be ≤ the size of the active risk set).seed: an optional integer random seed for
reproducibility.reh <- remify::remify(edgelist = history, model = "tie")
out_sampled <- remstats(
tie_effects = ~ inertia() + send("extraversion", info),
reh = reh,
sampling = TRUE,
samp_num = 20,
seed = 42
)The output is an object of class tomstats_sampled. Its
structure differs from the standard output: rather than a full array
over all dyads, it stores only the statistics for the sampled dyads plus
the observed event at each time point. This object can be passed
directly to remstimate for model estimation using the
case-control likelihood.
Note that case-control sampling is only available for the tie-oriented model and is not supported for the actor-oriented model.
We prepare the event history for the actor-oriented model. Note that actor-oriented modeling requires directed events:
For the actor-oriented model, statistics must be specified separately for the two modeling steps:
sender_effects — models which actor sends the next
event.receiver_effects — models which actor the sender chooses as
receiver.An overview of available statistics for each step is obtained with
actor_effects() or ?actor_effects.
We start by requesting the outdegreeSender statistic for
the sender step:
The output for the actor-oriented model is a list with two elements:
Since no statistics were requested for the receiver step,
receiver_stats is empty. The sender_stats
object contains the baseline statistic (automatically included when
ordinal = FALSE) and the requested
outdegreeSender statistic:
out
#> Relational Event Network Statistics
#> > Model: actor-oriented
#> > Computation method: per time point
#> > Sender model:
#> >> Dimensions: 114 time points x 10 actors x 2 statistics
#> >> Statistics:
#> >>> 1: baseline
#> >>> 2: outdegreeSenderThe dimensions of out$sender_stats are 115 × 10 × 2:
rows are unique time points, columns are potential senders, and slices
are statistics.
We extend the model with a statistic for the receiver choice step:
The statistics for the receiver choice step are accessed with
out$receiver_stats. The baseline statistic is not defined
for the receiver step, so its dimensions are 115 × 10 × 1: rows are time
points, columns are potential receivers, and slices are statistics.
The computed values in the receiver choice step give the statistic for each potential receiver, given the current sender. For example:
# Label columns with receiver names and rows with senders
dimnames(out$receiver_stats)[[2]] <- reh$meta$dictionary$actors$actorName
dimnames(out$receiver_stats)[[1]] <- reh$edgelist$actor1[-1]
head(out$receiver_stats[,, "inertia"])
#> 101 103 104 105 107 109 111 112 113 115
#> 105 0 0 0 0 0 0 0 0 1 0
#> 115 0 0 0 0 0 0 0 0 0 0
#> 101 0 0 0 0 0 0 0 0 0 0
#> 113 0 0 0 0 0 0 0 0 0 0
#> 105 0 0 0 0 0 1 0 0 1 0
#> 113 0 0 0 0 1 0 0 0 0 0At the first time point, inertia is zero for all receivers because no prior events have occurred. At the second time point, inertia is 1 for the receiver of the first event (given the same sender). At the third time point, the sender is a different actor with no prior sending history, so inertia is again zero for all receivers.