WFU Grid Distribution Tool

This generic grid-processing architecture GUI which allows automated 
distribution of MATLAB programs or scripts (Fig. 3). This is very useful for 
programs that need to run on many files or images (for example, preprocessing 
of MRI images for a VBM analysis on 30 subjects). The only requirement of the 
MATLAB script to be distributed is that it be structured to take an index of 
values, a binary image mask of voxels, or list of filenames to operate on as 
input. Our architecture takes advantage of the Sun Grid Engine “array job” 
task distribution feature. The grid distribution tool is designed to take any 
appropriately written MATLAB function, and distribute it on a computing 
cluster using the Sun Grid Engine array job feature. Input to the grid 
distribution tool includes the name of the program to be distributed, the 
distribution index type and size, input parameter file for distribution, and 
output directory. The input program should allow its first parameter to be a 
list of filenames (“File list”), an image mask or array of indices (“Image”), 
or simply an array containing iteration number (“Loop”). Individual jobs can 
be partitioned based on “point” (single index or file at a time), “slicewise” 
for image volumes, or by vector size (e.g. ten files or voxels with each job). 
The Distribution tool “option” selection (file list, image, loop) define the 
nature and bounds of the input data (e.g., how many total instantiations for 
the whole job). The file list option is for a list of filenames with the 
bounds being the number of files in the list. The image option implies a 
single 3D image volume as input with the procedure performed on all voxels in 
the image. The loop option is for a procedure that has no external input with 
the bounds defined as the number of loops (iterations). The loop option can be 
used to define a specific number of iterations to perform, and then stop (as 
for Monte Carlo simulations). The partition choices (point slice and vector) 
define how many instantiations of the procedure are to be performed on each 
node. The vector option partitions jobs to each node based on the vector size. 
Point and slice partitioning are included as separate options for convenience. 
A vector size of 1 is equivalent to point partitioning. Slice is a partitioning 
scheme specific for 3D image inputs (partitioning to each node based on one 
image slice at a time). Slice partitioning can also be performed using the 
vector option, if the number of voxels in a slice is used as the vector size. 
For example, a program can be written that normalizes anatomic data to a T1 
template in which the input is a list of files to operate on: 
normalize(filenames). If this program were called with a list of 100 files, 
it may take days to complete in series. Using our grid distribution tool, this 
same task can be accomplished in a fraction of the time, without any 
additional coding. The grid distribution tool is launched within MATLAB. It 
generates a k-shell wrapper to the MATLAB script to be distributed, sets up 
the input parameters, and launches the SGE using the array-job feature in the 
background with the k-shell wrapper. The array job feature is designed to run 
multiple instantiations of the same program on a cluster. The inputs to each 
instantiation are determined on the basis of shell environment parameters 
defining that particular instantiation. For the previously described 
normalization procedure to be performed on 100 files, each instantiation of 
the program would have an instantiation id number assigned to it by the grid 
(e.g. 1100), that is read by the k-shell wrapper from an environment shell 
variable, and used to define the respective input filename to the matlab 
script being distributed.
