Introduction
Relational event modeling approaches enable researchers to
investigate both exogenous and endogenous factors influencing the
evolution of a time-ordered sequence of relational events - commonly
known as a relational event history. These models are
categorized into tie-oriented models, where the probability of
a dyad interacting next is modeled in a single step (e.g., see Butts,
2008), and actor-oriented models, which first model the
probability of a sender initiating an interaction and subsequently the
probability of the senders’ choice of receiver (e.g., see Stadtfeld
& Block, 2017). The R
package remstats
is
designed to compute a variety of statistics for both types of
models.
The remstats
package is part of a bundle of
R
-packages developed by researchers from Tilburg University
intended to aid applied researchers in the application of relational
event modeling. For preparing the relational event history,
remstats
assumes the prior application of
remify::remify()
(available on CRAN). Model estimation can
subsequently be executed using remstimate
(available on
GitHub at github.com/TilburgNetworkGroup/remstimate).
The following provides a brief introduction to computing statistics for relational event history data with remstats. We’ll begin with a quick workflow example, followed by a detailed description of the procedure for the tie-oriented model. Finally, we’ll provide an explanation of the procedure for the actor-oriented model.
Workflow example
# Load data
data(history)
data(info)
# Define effects
effects <- ~ 1 + send("extraversion", info) + inertia()
# Prepare event history with the 'remify package'
rehObject <- remify::remify(edgelist = history, model = "tie")
# Compute statistics
statsObject <- remstats(reh = rehObject, tie_effects = effects)
# Estimate model parameters with the 'remstimate' package
# fit <- remstimate::remstimate(reh = rehObject, stats = statsObject,
# method = "MLE", timing = "interval")
Getting started
Data
Relational event history data describes a time-ordered series of interactions between actors in a network. Such interactions are referred to as relational events. A relational event minimally contains information on the time of the event and the actors that are involved in the event.
As an illustration, we use the history
data object in
the remstats
package. This data object is a randomly
generated relational event history. A description of the simulated data
can be accessed with ?history
. Here, we read that
history
is a small simulated relational event history with
115 events. Besides information on the time and actors, for each event
there is also information on the setting and an event weight. We can
view the first six events with:
head(history)
#> time actor1 actor2 setting weight
#> 1 238 105 113 work 1.33
#> 2 317 105 109 work 1.64
#> 3 345 115 112 work 1.82
#> 4 627 101 115 social 1.25
#> 5 832 113 107 social 1.67
#> 6 842 105 109 work 2.30
We prepare the relational event history for computation of statistics
for the tie-oriented model with the remify
function from
the remify
package. Whenever the weight variable is present
in the edgelist
supplied to remify
, it assumes
that we want to use these to weight the events in the computation of the
statistics. In this example, we don’t want this, thus we set the weight
to one for all events:
history$weight <- 1
reh <- remify::remify(edgelist = history, model = "tie")
Besides the relational event history itself, relational event
modeling often requires a second data object with exogenous information
on the actors in the network. Information on the actors in the simulated
data example in remstats
is stored in the info
object. A description of the info
data can be accessed with
?info
. Here, we read that the info
object
stores for the 10 actors in the network information on their age, sex,
extraversion and agreeableness score. Moreover, extraversion and
agreeableness is measured multiple times during the observation period.
The time variable tells us when the values change. We can view the
attribute information for the first two actors with:
head(info)
#> name time age sex extraversion agreeableness
#> 1 101 0 0 0 -0.40 -0.14
#> 2 101 9432 0 0 -0.32 -0.26
#> 3 101 18864 0 0 -0.53 -0.45
#> 4 103 0 0 0 -0.13 -0.65
#> 5 103 9432 0 0 -0.43 -0.44
#> 6 103 18864 0 0 -0.13 -0.43
Compute statistics for the tie-oriented model
First, we compute statistics for modeling relational event history
data with a tie-oriented model. The statistics that are requested need
to be supplied to the tie_effects
argument of
remstats()
, specified in an object of class
formula
. This specification should be in the form
~ terms
.
An overview of the statistics that can be computed for the
tie-oriented model is available using the tie_effects()
function or its help documentation ?tie_effects
:
In this illustration, we start with requesting only one statistic:
the inertia statistic. Most statistics can be tailored to the user’s
needs. For example, lets view the description for the
inertia
statistic using ?inertia
. Here, we can
read that the inertia statistic computes for every timepoint t
for every pair of actors (i,j) in the riskset the number of
past events. With the scaling
argument, one of the methods
for scaling the statistic can be chosen. The consider_type
argument is relevant when event types are in the dependent variable,
which we do not consider in this example.
In this illustration, we will standardize the inertia statistic. To request this statistic, we define the formula as follows:s
effects <- ~ inertia(scaling = "std")
Now, we have everything we need to compute our first statistic:
out <- remstats(tie_effects = effects, reh = reh)
The remstats()
function outputs a 3-dimensional array
with statistics for the tie-oriented model. On the rows of this array
are the timepoints, the columns refer to the potential events in the
riskset and the slices refer to the different statistics:
dim(out)
#> [1] 115 90 2
Our statistics object has 115 rows, corresponding to the 115 time
points in the relational event history. It has 90 columns, corresponding
to the 90 events in the riskset. The statistics object has two slices,
that is because the baseline statistics is automatically computed when
the timing of the events in the relational event history is exact
(unless removed by specifying -1
in the
effects
formula) and saved in the first slice. The
remstats()
procedure assumes that the timing of the events
in the relational event history is exact and the full likelihood is used
in the estimation, unless the argument ordinal
in
remify::remify()
is set to TRUE
.
We can view the names of the statistics that are in the statistics object with:
out
#> Relational Event Network Statistics
#> > Model: tie-oriented
#> > Computation method: per time point
#> > Dimensions: 115 time points x 90 dyads x 2 statistics
#> > Statistics:
#> >> 1: baseline
#> >> 2: inertia
Here, we see that, indeed, a baseline and inertia statistic are computed.
Since we did not request anything special for the riskset in
remify::remify()
, it consists of every directed pair of
actors observed in the relational event history, which is 10*9 = 90
pairs. These pairs are saved in the riskset
attribute. We
can ask for the first few lines of this riskset:
head(attr(out, "riskset"))
#> sender receiver id
#> 1 101 103 1
#> 2 101 104 2
#> 3 101 105 3
#> 4 101 107 4
#> 5 101 109 5
#> 6 101 111 6
Here, we see that the first event in the riskset is the event were actor 101 sends an interaction directed towards actor 103. The id column refers to the column in the statistic object that contains the statistic(s) for this specific dyad. The first column in the statistic object refers to this first event in the riskset, the second column in the statistic object to the second event in the riskset, and so forth.
Inertia is an example of an endogenous statistic: it is a
function of the relational event history itself. Next, we are going to
add a request for an exogenous statistic. For this we need the
exogenous information on the actors in the info
object.
As an illustration, we are going to request the statistic for an
effect of extraversion on sending events, i.e., a send effect. The
description of a send effect is accessed with ?send
. Here,
we read that we need to supply the variable for which we want to specify
a sender effect and that this variable should correspond to a column in
the attr_actors
object that we supply. Thus, we specify a
send effect of extraversion with
send("extraversion", attr_actors = info)
. Here, we specify
the attr_actors
object within the send()
function. Alternatively, it can be supplied to remstats()
.
This is for example useful if you want to compute a bunch of exogenous
statistics using the same attr_actors
object.
Statistics in the effects
formula should be separated
with the +
. Finally, we add an interaction between the
inertia()
statistic and the send()
statistic.
This can be done by using the *
or :
operator:
Compute statistics for the actor-oriented model
The procedure to compute statistics for the actor-oriented model is
largely similar to what is written above, except that statistics have to
be specified separately for the sender activity rate step of the model
and the receiver choice step of the model. The statistics requested for
these two modeling steps need to be supplied to two different effects
arguments, namely sender_effects
and
receiver_effects
, respectively.
An overview of the statistics that are available for the
actor-oriented model in the two modeling steps can be obtained using the
actor_effects()
function or its help documentation
?actor_effects
.
In this illustration, we start with requesting only one statistic for
the sender activity rate step: the outdegreeSender statistic.
First, lets view the description for the outdegreeSender
statistic using ?outdegreeSender
. Here, we can read that,
in the sender activity rate step of the actor-oriented model, the
outdegreeSender statistic computes for every timepoint t for
every actors i the number of outgoing past events. With the
scaling
argument, one of the methods for scaling the
statistic can be chosen.
First, we prepare the event history for computing statistics for an actor-oriented model:
reh <- remify::remify(edgelist = history, model = "actor")
To compute the outdegreeSender statistic for the sender activity rate
step we supply it to the sender_effects
argument of
remstats()
:
effects <- ~ outdegreeSender()
out <- remstats(sender_effects = effects, reh = reh)
The outputted remstats object is now a list with two elements:
sender_stats
and receiver_stats
:
names(out)
#> [1] "sender_stats" "receiver_stats"
Since we did not request any statistics for the receiver choice step
here, the receiver_stats
object is empty. The
sender_stats
object contains the statistic array with the
baseline
statistic (again, automatically computed unless
ordinal = TRUE
), and the requested
outdegreeSender
statistic:
out
#> Relational Event Network Statistics
#> > Model: actor-oriented
#> > Computation method: per time point
#> > Sender model:
#> >> Dimensions: 115 time points x 10 actors x 2 statistics
#> >> Statistics:
#> >>> 1: baseline
#> >>> 2: outdegreeSender
The dimension of out$sender_stats
is 115 x 10 x 2. On
the rows we have the timepoints, the columns refer to the actors that
can be senders and the slices refer to the different statistics.
Lets extend our model and also request a statistic for the receiver choice step:
sender_effects <- ~ outdegreeSender()
receiver_effects <- ~ inertia()
out <- remstats(sender_effects = sender_effects, receiver_effects = receiver_effects, reh = reh)
We can access the statistic computed for the receiver choice step
with out$receiver_stats
. In this step, the baseline
statistic is not automatically computed (and not defined). Hence, the
dimensions of the statistics object for the receiver choice step are 115
x 10 x 1. On the rows we have again the timepoints, on the columns now
the receivers and on the slices the statistics.
Note that the computed values of the statistic in the receiver choice step are equal to the values for this receiver, given the current sender. For example, lets review the first six lines:
# Set the column names equal to the receivers
colnames(out$receiver_stats) <- attributes(reh)$dictionary$actors$actorName
# Set the rownames equal to the senders
rownames(out$receiver_stats) <- reh$edgelist$actor1
# View the first six lines
head(out$receiver_stats[,,"inertia"])
#> 101 103 104 105 107 109 111 112 113 115
#> 105 0 0 0 0 0 0 0 0 0 0
#> 105 0 0 0 0 0 0 0 0 1 0
#> 115 0 0 0 0 0 0 0 0 0 0
#> 101 0 0 0 0 0 0 0 0 0 0
#> 113 0 0 0 0 0 0 0 0 0 0
#> 105 0 0 0 0 0 1 0 0 1 0
At the first timepoint, the inertia statistic for all receivers given the current sender (actor 105) is zero because no prior events have occurred. At the second timepoint, the sender is again actor 105. Now the inertia statistic is equal to the 1 for the receiver of the first event (actor 113). At the third timepoint, the inertia statistic is again zero for all receivers because now the sending actor is 115, who did not send any prior events.