bxh_eventstats usage and implementation


bxh_eventstats is a program that collects subsets of one or more
4-dimensional time-series of volumes.  Each subset represents a time
interval (epoch) surrounding the onset time of an _event_.  A given
collection of these subsets, when averaged, is called an _epoch
average_ or _bin_.  The activity reflected in these bins, which are
also time-series, are then optionally correlated to a template
waveform (usually modeled on a hemodynamic response) or to other bins
using t-tests.  An averaging epoch is specified as a number of TRs
before and after an event.  If an event onset does not occur exactly
at the time of an image acquisition then the images in the epoch are
estimated using cubic-spline interpolation.

An event is defined as a time interval (represented by onset and
duration) during which one or more named parameters have a given
value.  For example, an event may describe the presentation of a
stimulus, like a picture of a red circle, with onset time 8.5s and
with a duration of 3 seconds.  The parameters that describe this event
may be color and shape (whose values are 'red' and 'circle'
respectively), as well as any other user-specified codes one may wish
to assign to the event.  These events are stored in one or more XML
files.  Multiple event files can be applied to one time-series.

One specifies the time points that go into a given bin by providing
one or more _queries_, which select events that match certain
characteristics.  Each bin is parameterized by up to three queries.
The initial list of time points representing the bin is comprised of
the onset times of the events that match the _primary query_.  One may
wish to filter this list further by only accepting those time points
that overlap other events with particular characteristics --
essentially choosing a list of acceptable time intervals -- this can
be done by specifying a _filter query_.  One may also wish to exclude
time points whose epochs might overlap another events matching certain
characteristics -- essentially choosing a list of unacceptable time
intervals -- this is done with an _epoch exclusion query_.  The
primary query is applied to the original event list.  Both filter and
epoch exclusion queries are applied to a "transition event list",
which is generated by merging all simultaneous/overlapping events.

As an example, consider the following timeline:

|---------LEARN---------|------------------PROBE----------------|
|---A---|---B---|---C---|---C---|---A---|---B---|---B---|---A---|
0-------10------20------30------40------50------60------70------80----> t
                              ||
                              \/
                             spike

LEARN and PROBE represent a high-level division of an example task
into two blocks.  The LEARN block has the following time interval
[0,30), and PROBE is [30,80).  A, B, and C represent the presentation
of particular stimuli, where they correspond to a red circle, blue
square, and green square respectively.  Assume that images were
acquired every 2 seconds.  Also assume that an exclusionary event,
such as a spike, happened at time 36, which caused the mean intensity
of that volume to be more than 3 standard deviations from the mean
(across all acquired volumes).  The following example event list
stores these events and their corresponding parameter lists.

onset  duration  params
    0        30  blockname=LEARN
   30        50  blockname=PROBE
    0        10  shape=circle, color=red
   10        10  shape=square, color=blue
   20        10  shape=square, color=green
   30        10  shape=square, color=green
   40        10  shape=circle, color=red
   50        10  shape=square, color=blue
   60        10  shape=square, color=blue
   70        10  shape=circle, color=red
   36         2  mean_intensity_z_score=4.3

If you are interested in creating a bin to represent the onsets of all
square-shaped stimuli the primary query could be "shape==square".  If
you were only interested in those square-shaped stimuli that occured
during a PROBE block, you could specify an additional filter query
"blockname==PROBE".  This would then match the events occuring at time
points 30, 50, and 60, ignoring the squares at times 10 and 20 because
they are in the LEARN block.

Another bin might include red circles, and the primary query would be
"color==red & shape==circle".  Assume that you have specified the
averaging epoch to include not only the image acquired at the same
time as the event, but also the 2 images before and the 3 images after
the event (i.e. a total epoch duration of 12 seconds).  If you wanted
to exclude any time point in the primary query whose averaging epoch
includes an image whose mean intensity is more than 3 standard
deviations from the mean, one could specify an epoch exclusion query
of "mean_intensity_z_score > 3" which would allow time point 0, but
remove time point 40.

You may ask why there is a separate primary query and filter query;
why can't they be combined into one?  Consider the following event
list where the background alternates between red and blue, and a
picture is displayed alternately in the upper (up) and lower (down)
halves (up) of the display.  Background changes are recorded in
separate events from the picture stimulus events.

background: |--RED-|-BLUE-|--RED-|-BLUE-|-RED-|-BLUE-|
field:      |---up---|--down--|---up---|--down--|-up-|
                                 |--1--|        |-2--|

Let's say you were interested in the time points that correspond to
the onset of stimuli in the upper half of the display, but only if the
background was blue at the time of onset.  First of all, because the
field and background parameters are in separate events, there is no
one event that matches the query "field==up & background==BLUE".  Even
ignoring that for the moment, if we tried to find those time intervals
for which both conditions were true (labeled 1 and 2 above), only the
second interval matches what we really wanted, because the onset of
the up stimulus corresponding to interval 1 actually happened during a
red background.  But there is not enough information in the above
query to tell you that the onset time of the up/down events is
important; it merely says I want an event where the field parameter is
up and the background is BLUE.  However, by specifying two queries --
one to specify the actual events whose onset times we care about
("field==down"), and another to filter this by other conditions that
need to happen at the same time ("background==BLUE") -- we can get the
intended output.


Usage
-----

bxh_eventstats [opts] outputprefix imgfile1 eventfile1a[,eventfile1b...] [imgfile2 eventfile2a[,eventfile2b...] ...]

This program "queries" a 4-D data set (with corresponding event lists)
and produces averages of all time courses surrounding each event that
match the query.  Multiple independent queries may be specified, and
the width and position of each time course relative to the event is
also user-specified.  Multiple event files corresponding to the same
image data can be specified separated by commas (the filenames/paths
themselves are therefore prohibited from containing commas).  This
program also correlates the time series of each voxel in a 4-D time
series of volumes (inputxmlfile) with a given "template" vector
(specified with --template option).  Outputs (in FILE_cor.bxh and
FILE_tmap.bxh) are 3-D data sets storing the correlation coefficient
(r) and the corresponding t-statistic (derived from r).  T-statistics
of the comparison between two queries is also supported (using the
--tcompare option).

Primary queries are specified using the --query option.
Filter queries are specified using the --filterquery option.
Epoch exclusion queries are specified using the --expochexcludequery option.

The option --help gives a detailed list of all options.


Implementation details
----------------------

The primary query is applied to the original event list to create a
list of candidate time points for the bin.  Then a "transition" event
list is generated from the original event list by merging
simultaneous/overlapping events in such a way that produces a list of
events recording all the parameters in effect at any time.  For
example, the following two events:

   |----A---|
       |----B---|
 |---------------------> t

would be converted into three "transition" events:

   |-A-|-AB-|-B-|
 |---------------------> t

where the AB event contains all the parameters of the original A and B
events.  This "transition" event list is used for the filter query and
epoch exclusion query.  The filter query, in essence, specifies a list
of acceptable time intervals; any candidate time point that is not
contained within a transition event matched by the filter query is
removed from the candidate list.  The epoch exclusion query specifies
a list of dangerous time intervals; any candidate time point whose
averaging epoch overlaps with any transition event matched by the
epoch exclusion query is removed from the candidate.

We generate a transition event list by sorting the original event list
by onset, then by duration.  If there are multiple event lists
(corresponding to the same time duration), then they are concatenated
before sorting.  Then we gradually convert the list into a
transition event list by placing a cursor at the first event in the
list and doing the following:

While the cursor points to a valid event, do:
  Let A be the event pointed to by the cursor.
  Let B be the event immediately following A in the sorted list.
  Pre-condition: Events preceding A in the sorted list are transition events
  Pre-condition: A starts no later than B (due to sorting constraints).
  Pre-condition: If A starts at the same time as B, A ends no later than B
                 (ditto).
  If A and B do not overlap
    move the cursor to B
  Else if A and B are equivalent intervals
    copy parameters of B into A
    remove B from sorted list
    keep cursor at A
  Else if A starts at the same time as B and A is a single point (duration 0)
    copy parameters of B into A
    move the cursor to B
  Else if A [x,y) starts at the same time as B [x,z)
    split B into two intervals C [x,y) and D [y,z), copying all parameters
    remove B from sorted list
    insert C into sorted list (after A)
    insert D into sorted list
    keep cursor at A
  Else if A [w,x) overlaps and starts before interval B [y,z)
    split A into two intervals C [w,y) and D [y,x), copying all parameters
    remove A from sorted list
    insert C into sorted list
    insert D into sorted list
    move the cursor to the event immediately following C in the sorted list
  Repeat

The end result is a list of events for which, at any given time point,
there exists at most one event which stores all the parameters in
effect at that time point.
