COMPARE:Classification Of Morphological Patterns using Adaptive Regional Elements


Contents

    * 1 Introduction
          o 1.1 Feature extraction
          o 1.2 Feature selection and classification
          o 1.3 Classifier application
    * 2 Description of COMPARE package
          o 2.1 readme file
          o 2.2 sub-folders
    * 3 Code structure of the software package
          o 3.1 Script files: Compare and Compare_test
          o 3.2 Utility files: nrutil_compare.h nrutil_compare.cc
          o 3.3 Class definition and implementation:
          o 3.4 List of parameters in Compare:
          o 3.5 List of parameters in Compare_test:
    * 4 Compilation and Installation of COMPARE package
          o 4.1 Platform
          o 4.2 Download
          o 4.3 Content in COMPARE
          o 4.4 Compiling
          o 4.5 Setting
          o 4.6 Testing the package
                + 4.6.1 Build a classifier:
                + 4.6.2 Use the classifier to classify other samples:
    * 5 Usage of the COMPARE package
          o 5.1 Preparation of Data
          o 5.2 Classifier Construction and Application
          o 5.3 Example for classification of female Schizophrenia subjects
    * 6 Details of the COMPARE package
          o 6.1 Feature Extraction
          o 6.2 Feature Selection and Classifier Construction
          o 6.3 Classifier Application
          o 6.4 Group Difference
          o 6.5 ROC curve
    * 7 Reference

1. Introduction

This document describes the procedure for brain image classification using COMPARE software package. The related algorithms are introduced in following two papers:

Yong Fan, Dinggang Shen, Ruben C. Gur, Raquel E. Gur, and Christos Davatzikos, COMPARE: Classification Of Morphological Patterns using Adaptive Regional Elements, IEEE Trans. on Medical Imaging, 26(1): 95-105, January 2007.

Yong Fan, Dinggang Shen, and Christos Davatzikos, "Classification of Structural Images via High-Dimensional Image Warping, Robust Feature Extraction, and SVM", MICCAI, Palm Springs, California, USA, Oct 26~29, 2005.

COMPARE is a method for classification of structural brain magnetic resonance (MR) images, which is a combination of deformation-based morphometry and machine learning methods. Before running classification, a morphological representation of the anatomy of interest is obtained from structural MR brain images using a high-dimensional mass-preserving template warping method [1, 2]. Regions that display strong correlations between tissue volumes and classification (clinical) variables learned from training samples are extracted using a watershed segmentation algorithm. To achieve robustness to outliers, the regional smoothness of the correlation map is estimated by a cross-validation strategy. A volume increment algorithm is then applied to these regions to extract regional volumetric features. To improve efficiency and generalization ability of the classification, a feature selection technique using Support Vector Machine-based criteria is used to select the most discriminative features, according to their effect on the upper bound of the leave-one-out generalization error. Finally, SVM-based classification is applied using the best set of features, and it is tested using a leave-one-out cross-validation strategy. Although the algorithm is designed for structural brain image classification, it is readily applicable for functional brain image classification with proper feature images. For simplicity, here we focus on structural brain image classification.

The COMPARE software package consists of two parts: classifier construction and classifier application. Classifier construction includes feature extraction, feature selection and SVM classifier training, whose parameters are determined by means of a leave-one-out cross-validation strategy. This part can also be used to test the performance of the algorithm on a given training data set. The parameters determined in classifier construction step are used for classification of new samples, which is implemented in the classifier application part. These functionalities are implemented by three modules:

1.1 Feature extraction

A learning based feature extraction method is implemented in this module. This module can extract regional features adaptive to the training samples. The input to this module is a set of training samples which are morphological feature images for structural image classifications, or some feature images for functional image classifications, and all of the images are spatially normalized in a standard template space. Based on given training samples, the template space is partitioned into different regions according to the similarity of discriminative measures of voxel-wise features. From these regions, regional measures are extracted, such as mean value, to form feature vectors for classification. A leave-one-out strategy has been adopted to get separate training and testing files for feature selection and classification.

1.2 Feature selection and classification

A hybrid feature selection method is implemented to get most discriminative features for classification, which consists of a ranking based feature selection and a SVM based subset feature selection. With a set of discriminative and reliable features extracted and selected, a nonlinear SVM classifier is constructed to do the classification. Feature selection and classification are applied to the leave-one-out training and testing files obtained in the feature extraction module. This module outputs the classification rates with respect to the number of features used for classification. By optimizing the SVM parameters and the number of features for classification, an optimal classifier can be constructed.


1.3 Classifier application

The classifier constructed by above modules, including the information obtained in feature extraction, proper parameters for feature selection and classification, can be used to predict the clinic variable of a new sample by inputting its morphological feature images into the constructed classifier. This module outputs the prediction value of the clinic variable (SVM score).

Besides these major functions, the software can also output ROC curve and spatial group difference maps.


2. Description of COMPARE package

The software package is available at https://sbia-svn.uphs.upenn.edu/projects/COMPARE/ and contains:

2.1 readme file

 brief description of the package


2.2 sub-folders

   src: .h and .cc files  source files for the COMPARE classifiers; they are detailed in section 6
   script: this is a folder containing shell scripts for training classifiers and testing new samples 
   SVMTorch: SVMTorch package  SVM classification and subset feature selection are built upon the 
   package, so they have to be compiled using make before compiling the COMPARE package
   bin: this is the default target for executables
   doc: this folder contains the documents
   demo: this folder contains image data for running a test

After check out, you can to run "make" to compile and "make install" to install the package (remember to "make" the SVMTorch first). Furthermore, the software package uses "tar" and "gzip" to archive and compress parts of the output, so these programs should be available on your platform. The software has been tested with GNU tar version: 1.13.25, gzip version: 1.3.3.

3. Code structure of the software package

3.1 Script files: Compare and Compare_test

Compare is the main file to build a classifier, which calls executables for data input, feature extraction, feature selection, and classifier training. Compare_test is a script to classify new samples using classifiers trained by Compare. The executables to be called by compare include:

 COMPARE_check_input
 This program is used to check if the subject list input files are in correct format for 
 COMPARE package. If the input files are good, the program will generate text files 
 containing information about the training and testing sets.

 COMPARE_extract
 This program extracts regional features by learning from training samples. 
 The implementation of the regional feature extraction method can be found in 
 the class: Feature_Extraction. (Feature_Extraction.h and Feature_Extraction.cc). 

 COMPARE_build
 This program performs subset feature selection and builds classifiers. 
 The implementation of the support vector machine-recursive feature elimination can
 be found in the class: SVM_rfe (COMPARE_SVM_rfe.h COMPARE_SVM_rfe.cc). The implementation
 of the feature selection and SVM classification was based upon the software package: SVMTorch.

 COMPARE_build_check
 This program checks the classification performance using different SVM kernel sizes 
 and outputs the best kernel size.

 COMPARE_SVM_discriminate
 This program calculates the gradient directional difference between support vectors 
 belonging to different classes. Technique details can be found in references [3, 4]. 

 COMPARE_float_sum
 This program calculates sum of spatial discrimination maps computed in 
 leave-out-out experiments.

 COMPARE_float_sum_m
 This program calculates sum of spatial discrimination maps when different numbers 
 of features were used to build the classifier.

 COMPARE_model
 This program tests new samples using the COMPARE classier. This classifier 
 contains not only a SVM classifier, but also information about regional features
 spatial locations and orders with respect to their discrimination ability for classification.

 COMPARE_average_score
 This program calculates mean of classification scores output from a serial of classifiers.

 COMPARE_score_mean
 This program calculates the mean of classification scores output from a serial of classifiers
 with different numbers of features.

 COMPAER_score_sum
 This program calculates sum of classification scores output from a serial of classifiers
 with different numbers of features.


3.2 Utility files: nrutil_compare.h nrutil_compare.cc

This is a collection of methods for array (1D vector), matrix (2D matrix), tensor (3D or higher dimension tensor), and some basic utility functions used in the package.


3.3 Class definition and implementation:

Feature_Extraction.h Feature_Extraction.cc 
These files contain the implementation of regional feature extraction method. 

COMPARE_SVM_rfe.h COMPARE_SVM_rfe.cc 
These files contain the implementation of support vector machine-recursive 
feature elimination algorithm, a well known subset feature selection technique.
The implement is based upon the software package SVMTorch3.

Feature_Extraction_model.h Feature_Extraction_model.cc
These files contain the implementation of how to using regional feature information 
for testing new samples.

3.4 List of parameters in Compare:

There are several optional parameters available for optimizing the classifier, classifying unknown subjects, and outputing the constructed classifier.

-k <int> : std value for Gaussian kernel in SVM (default: 1, range:1~10000)
           This parameter is a SVM parameter to determine Gaussian kernel size.
           This parameter must be an integer. The default value is 1.

-j <int> : std value for Gaussian kernel in SVM (should be greater than k value, optional). 
           This parameter is used with parameter k to set the searching range of 
           Gaussian kernel size in SVM. This parameter must be an integer and greater than k. 

-c <int> : C value in SVM (default: 10, range: 1~10000)
           This parameter is a SVM parameter to determine tradeoff between training error 
           and the margin. This parameter must be an integer. The default value is 10.

-t file_name: subject list file containing subjects to be classified by the classifier trained
              This parameter is for inputting the test subject list. When a test subject list 
              file is input, a text file named as test_result_file ("result_file" with prefix
              "test_") will be output, containing SVM scores corresponding to the testing 
              subjects.

-s <float> : smoothing Gaussian kernel size for score map (default: 3.0, range: 1~10)
             This parameter is related to the number of regions to be generated 
             in the feature extraction step. A suitable parameter should be selected to 
             generate meaningful regions. When a bigger number is selected, the program 
             will generate relative larger regions. 

-n <int> : searching space of feature number (default: 150, range:1~1500)
           This parameter is used to constrain the searching space of feature selection. 

-m model_name: name of a file to store the trained classifier (huge storage space required!!!)
               This parameter is for inputting the name for storing the constructed classifier.

-M <int> : starting point of a range of features for final classification 
           (default: the largest number of features that yields the best classification rate)

-N <int> : ending point of a range of features for final classification 
           (default: the largest number of features that yields the best classification rate)
           These two parameters are used to input the range of features for final  
           classification.

-S spatial_map: prefix of name of group difference spatial maps (images in float format) 
                (defalut: no output) 
                This parameter is used to input the name of file to store the estimated 
                group difference maps. These group difference maps are corresponding to 
                the features used in classification.

3.5 List of parameters in Compare_test:

There are three parameters:

model_file : file name of the classifier trained by "Compare"

data_list_file: file name of the subject list file containing subjects to be classified

result_file: file name of the file containing the classification results

4. Compilation and Installation of COMPARE package

4.1 Platform

The software package is available for Linux on x86 machines. It has been tested on the kernel version 2.6.9-34.ELsmp. The software package has been successfully compiled using g++ (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) and g++ (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8).

4.2 Download

The COMPARE package is available via the subversion control system (SVN) of SBIA. To check out the current version of the software, please use following svn command: <folder> SVN co https://sbia-svn.uphs.upenn.edu/projects/COMPARE/ This command will download the software package into your folder and store in a sub-folder with name "COMPARE".

4.3 Content in COMPARE

./bin  -- executables of COMPARE
./SVMTorch SVMTorch Package which is used to compile feature selection and SVM classification programs.
./doc documents
./demo sample to test the software package
./script script files
./src source files of the software package

4.4 Compiling

Go to ./SVMTorch/ folder, and type

make

Go to ./src/ folder, and type

make
make install
make clean

use the variable INSTALLDIR=/path to set your target installation path

then, you are ready to play if no error happens during the compiling.

4.5 Setting

Before training and testing classifiers, you need to make the software package available for using. To add the installation bin/ to your system PATH, try following command

setenv PATH ${PATH}:<your installation path>/bin

4.6 Testing the package

Type "make test" in the /src folder (remember to use INSTALLDIR=/path if you specified an installation path in make install)
for an automatic test. Skip to section 5

To run the testing manually follow the instructions below:
Go to ./demo/ folder which has 3 subfolders: data, list, comp, and a readme file. To run a test, enter the ./demo/comp/ subfolder.

4.6.1 Build a classifier:

Compare ../list/test_sub.lst bbl_test_rate.txt -m bbl_test.mdl 


This command trains a classifier with default options and the classifier output as "bbl_test.mdl".

Besides the classifier, there are another two files:

bbl_test_rate.txt: containing the best classification rate, classification rates as a 
                   function of the number of features used in classification, 
                   true positive and false positive rates for ploting ROC curve.
COMPARE_svm_kernel_size.bin: this is a text file containing the SVM kernel size used in classification

This classifier can be used to predict new samples. The training processing takes about 15mins on the Olympus platform.

4.6.2 Use the classifier to classify other samples:

Compare_test bbl_test.mdl ../list/test_sub.lst test_result.txt


This command uses the classifier built in above step to predict the class labels of samples in ../list/test_sub.lst. Rather than directly output class label (+1 or -1), the output are classification scores which are real values with positive or negative signs.

To test if the software package was properly compiled and installed, you can check the output with the results stored in folder ./demo/comp/test1.

5. Usage of the COMPARE package

5.1 Preparation of Data

The classification method is designed to learn a model or classifier from a cohort of subjects with known clinical variable, to predict the clinical variables of unknown new subjects. The raw features are structural MR images. From these images, more meaningful features are typically extracted, such as morphology information (a good example is RAVENS maps), which are spatially normalized onto a standard template space. There is generally more than one kind of feature available for each subject, for example, the RAVENS maps have three kinds of features: WM maps, GM maps, and CSF maps. No matter how many kinds of features available, the number of kinds of features used for classification should be same for all the subjects, i.e., each subject should be associated with same number of feature images spatially normalized onto a standard template space. For input convenience, one subject list file should be created in below format:

m	n
xdim	ydim	zdim
/your dir/data dir
subject1_{1}.img	subject1_{2}.img	...     subject1_{n}.img	1
sbuejct2_{1}.img	subject1_{2}.img	...	subject1_{n}.img	1
...			...			...	...                     .
subject{i}_{1}.img	subject{i}_{2}.img 	...	subject{i}_{n}.img 	-1
subject{i+1}_{1}.img	subject{i+1}_{2}.img 	...	subject{i+1}_{n}.img 	-1
...			...		        ...	...                     . 
subject{m}_{1}.img	subject{m}_{2}.img 	...	subject{m}_{n}.img 	-1

The number of available subjects and the number of available features are stored in the first line. The x, y, and z dimensions of the images are stored in the second line. The third line contains the location of the images.

The file names of feature images and their associated class label information are stored in the remaining lines (the image file must be in 2 byte format). Each line is corresponding to one subject. For training data, the classification labels must be +1 or -1. For testing data, the labels are typically assigned to 0, but can be any value. To separate different files, space or tab are both good.

A sample training subject list file for female schizophrenia classification is shown below. This file contains 61 subjects with 3 features (RAVENS maps: White Matter, Grey Matter, and CSF). The image dimension is 94x111x91.

61      3
94      111     91
/folder containing input images
10625am0_WM.img	 10625am0_GM.img	 10625am0_VN.img	1
10626am1_WM.img	 10626am1_GM.img	 10626am1_VN.img 	1
10629am0_WM.img	 10629am0_GM.img	 10629am0_VN.img 	1
10646am0_WM.img	 10646am0_GM.img	 10646am0_VN.img 	1
10651am0_WM.img	 10651am0_GM.img	 10651am0_VN.img 	1
10655am0_WM.img	 10655am0_GM.img	 10655am0_VN.img 	1
10707am0_WM.img	 10707am0_GM.img	 10707am0_VN.img 	1
10725am0_WM.img	 10725am0_GM.img	 10725am0_VN.img 	1
10727am0_WM.img	 10727am0_GM.img	 10727am0_VN.img 	1
10732am0_WM.img	 10732am0_GM.img 	10732am0_VN.img 	1
...			 ... 			...		...
01027am1_WM.img	 01027am1_GM.img 	01027am1_VN.img	-1
10274am1_WM.img	 10274am1_GM.img	 10274am1_VN.img	-1
10335am1_WM.img	 10335am1_GM.img 	10335am1_VN.img 	-1
...			 ... 			...		...
11129am0_WM.img	 11129am0_GM.img	 11129am0_VN.img	-1

5.2 Classifier Construction and Application

Given the training data, the software can automatically build an optimal classifier using a leave-one-out cross validation strategy. The constructed classifier can be directly used to classify unknown subjects. The classifier construction is implemented by a program named as "Compare", which can also classify unknown subjects simultaneously. Once a classifier is constructed, a program named as "Compare_test" can be used to classify unknown subjects.

Compare:
Usage: Compare  data.lst result_file [options]
[options:]
-k <int> :      std value for Gaussian kernel in SVM (default:1, range:1~10000)
-j <int> :      std value for Gaussian kernel in SVM, be used to set the searching 
                range for std (should be greater than above value). This value and 
                the k value set the searching range of std value for Gaussian kernel in SVM.
-c <int> :      C value in SVM (default: 10, range:1~10000)
-t file_name:   subject list file containing subjects to be classified by the classifier 
                trained (default: no testing)
-s <float>:     smoothing size for score map (default: 3.0,range:1~10)
-n <int>  :     searching space of feature number (default: 150, range:1~1500)
-m model_name:  name of a file to store the trained classifier 
                (huge storage space required!!!) (default: no output)
-M <int> :      starting point of a range of features for final classification
                (default: the largest number of features that yields the best classification rate)
-N <int> :      ending point of a range of features for final classification
                (default: the largest number of features that yields the best classification rate)
-S spatial_map: prefix of name of group difference spatial maps (images in float format)
                (defalut: no output)

The necessary inputs to this program are data.lst and result_file. data.lst is a subject list file containing the training subjects. result_file is a text file which contains classification rates corresponding to the number of features used for classification, and ROC curves. So, when no other parameters, this program can serve as a leave-one-out cross validation of COMPARE.

There are several optional parameters available for optimizing the classifier, classifying unknown subjects, and outputing the constructed classifier.

-k <int> :      std value for Gaussian kernel in SVM (default: 1, range:1~10000)
                This parameter is a SVM parameter to determine Gaussian kernel size. 
                This parameter must be an integer. The default value is 1.
-j <int> :      std value for Gaussian kernel in SVM (should be greater than k value). 
                This parameter is used with parameter k to set the searching range of 
                Gaussian kernel size in SVM. This parameter must be an integer and greater than k. 
-c <int> :      C value in SVM (default: 10, range: 1~10000)
                This parameter is a SVM parameter to determine tradeoff between training 
                error and the margin. This parameter must be an integer. The default value is 10.
-t file_name :  subject list file containing subjects to be classified by the classifier trained
                This parameter is for inputting the test subject list. When a test subject 
                list file is input, a text file named as test_result_file 
                (result_file with prefix test_) will be output, containing SVM scores 
                corresponding to the testing subjects.
-s <float> :    smoothing Gaussian kernel size for score map (default: 3.0, range: 1~10)
                This parameter is related to the number of regions generated to be generated 
                in the feature extraction step. A suitable parameter should be selected to 
                generate meaningful regions. When a bigger number is selected, the program 
                will generate relative larger regions. 
-n <int> :      searching space of feature number (default: 150, range:1~1500)
                This parameter is used to constrain the searching space of feature selection. 
-m model_name : name of a file to store the trained classifier (huge storage space required!!!)
                This parameter is for inputting the name for storing the constructed classifier.
-M <int>:       starting point of a range of features for final classification 
                (default: the largest number of features that yields the best classification rate)
-N <int>:       ending point of a range of features for final classification (default: the 
                largest number of features that yields the best classification rate)
                (M,N) these two parameters are used to input the range of features for final classification.
-S spatial_map: prefix of name of group difference spatial maps (images in float format) 
                (defalut: no output)
                This parameter is used to input the name of file to store the estimated 
                group difference maps. These group difference maps are corresponding to
                the features used in classification.


Compare_test:
Usage: Compare_test  model_file  data_list_file  result_file

This program is used to classify unknown subjects stored in data_list_file using a classifier model_file, and output SVM scores to result_file. The model_file can be obtained by using Compare.

5.3 Example for classification of female Schizophrenia subjects

Given a training subject list file, bbl_female.lst, a classifier can be constructed by

Compare ../list/bbl_female.lst bbl_female_rate.txt -k 3 -j 7 -c 10 -n 100 -s 3.5 
-m bbl_female.mdl -S bbl_female_map -t  ../list/test_sub.lst


To classify the subjects in bbl_female.lst, you can just type Compare_test bbl_female.mdl ../list/test_sub.lst test_result.txt

The results of this example are available at ./demo/test2.

This example can be run at any directory you like, however you should make the directory information correct in the data list files. For easy testing the software, your should run above command in "comp".

6. Details of the COMPARE package

This section details the programs used in the COMPARE software package, in case you want to run a classification analysis in a step by step way. This is prepared for users with solid knowledge of feature extraction, feature selection, and SVM based classification.

6.1 Feature Extraction

The features used for brain classification are extracted from automatically generated regions, which are obtained by learning from the training data.

The brain regions are generated by automatically partitioning the template brain space according to the similarity of classification power of local voxel-wise features. The classification power of each voxel-wise feature is measured by the Pearson correlation (pc) between this feature and its associated class label and a spatial consistency measure (sc) among this feature's spatial neighboring features. These two measures are combined to form an overall measure. A watershed segmentation algorithm is used to partition the template brain space into different regions. In order to avoid oversegmentation, Gaussian smoothing is applied to the score map before computing its gradient map.

Two methods are used to compute the regional feature from the generated regions. First method is to get the mean value of all voxels within a region. Second method is a selective volumetric increment method, similar to the forward feature selection. We first select a voxel with highest discriminative power in each region under consideration. Then, we start to include each neighboring voxel, under the condition that inclusion of this neighboring voxel will not decrease the discriminative power of regional feature calculated from the voxels currently selected. This procedure is iterated, similarly to a traditional region growing method, until no more voxels can be added to the set of selected voxels. The discriminative power of a regional feature is measured by the absolute value of Pearson correlation coefficient between this regional feature and the class label.

A leave-one-out strategy has been adopted here to optimize all the parameters related to the classification. So the feature extraction is implemented for leave-one-out cross-validation. For the classification of new samples, a classifier application module has been developed, which is introduced in section 6.3.

With the given {n} training samples with {t} kinds of features, the regional features can be extracted by the program: COMPARE_extract

Usage: COMPARE_extract [subject_list] [loo_id] <options> 
where <options> is one or more of the following:
<-sigma       value>    Gaussian smoothing kernel size (default:3.0)

The necessary information that should be input to the program includes

[subject_list]		subject list file containing information of training subjects.
[loo_id]		subject to be left out (start from 0)

Besides the necessary information that should be input to the program, there is only 1 optional parameter:

<-sigma     value>    smoothing Gaussian kernel size, which determines the region size (indirectly).

The outputs of this program are:

w_roi_region_{loo_id}_{t}.bin			large regions from watershed segmentation
r_roi_region_{loo_id}_{t}.bin			refined regions by incremental regional growing
w_roi_feature_location_{loo_id}.bin(text file)	spatial location information of features computed from w_roi, 
                                                which contains the information  of each feature's correspondence 
                                                to brain regions generated by the watershed segmentation method.
r_roi_feature_location_{loo_id}.bin(text file)	spatial location information of features computed from r_roi, 
                                                which contains the information of each feature's correspondence 
                                                to brain regions generated by selective volumetric increment method.
w_roi_train_{loo_id}.bin(text file)		feature file computed from w_roi, for training
r_roi_train_{loo_id}.bin(text file)		feature file computed from r_roi, for training
w_roi_test_{loo_id}.bin(text file)		feature file computed from w_roi, for testing
r_roi_test_{loo_id}.bin(text file)		feature file computed from r_roi, for testing

6.2 Feature Selection and Classifier Construction

Based on the feature files that are generated in last step, a hybrid feature selection method is implemented to select the most discriminate features for classification, which consists of two components: correlation based feature ranking and SVM-based subset feature selection. The correlation based feature ranking method provides an initial feature set, to be further optimized by the subset feature selection method. With the selected feature set, nonlinear support vector machines with Gaussian kernel are used to train classifiers. This step is also used to optimize the classification parameters, i.e., the leave-one-out cross-validation results can be used as a criterion to tune the classification parameters.

Feature selection and classifier construction is implemented by the program COMPARE_build

Usage: COMPARE_build [data_directory] [num_subject] [training file name base] [testing file name base] <options>

where <options> is one or more of the following:
<-num_feature       value>   number of candidate features to be used for feature selection and classifcation (default:150)
<-c                 value>   SVM parameter: trade-off between training error and the margin (default:50)
<-std               value>   SVM parameter: kernel size in rbf kernel (default:100)
<-start_point       value>   starting point of a range of features for final classification 
                             (default:the number of features yields best classification rate)
<-end_point         value>   ending point of a range of feature for final classification 
                             (default:the number of features yields best classification rate)

The necessary information that should be input to the program includes,

[data_directory] 		name of the folder in which the training and testing files are located
[num_subject]			number of sample available
[training file name base]	base name of training files, can be r_roi_train, or w_roi_train
[testing file name base]	base name of training files, can be r_roi_test, or w_roi_test

The optional parameters are the search space "number of features" of feature selection, SVM parameters including kernel size and trade-off between training error and the margin, and the range of features used in final classification.

The outputs of the program are:

order_0.bin, order_1.bin , ... , order_n.bin (text files) feature order files corresponding to each leave-one-out case.
svm_model_0.bin, svm_model_1.bin, ..., svm_model_n.bin svm model files corresponding to each leave-one-out case, 
                                                       that yields the best average classification rate.
classification_rate.bin (text file)                    classification rates with respect to different number of 
                                                       features used for classification, and a ROC curve corresponding 
                                                       to the constructed classifier. 

The outputs of this program can be used in conjunction with those outputs of the feature extraction for classification of new subjects.

6.3 Classifier Application

With the outputs of above two programs, the predication of new samples can be performed by COMPARE_model. The data of the new samples should be organized as described in section "preparation of data".

Usage: COMPARE_model [subject_list] [region_file_base] [feature_location_file] [feature_order_file] [svm_model_file]
       

The necessary information that should be input to the program includes,

[subject_list]            the subject list file containing information of testing subjects.
[region_file_base]        the region file name base 
[feature_location_file]   the feature location file
[feature_order_file]      the feature order file
[svm_model_file]          the feature SVM model file

The last four inputs should be consistently corresponding to the classifier (model) you selected.

The outputs are the SVM scores corresponding to the testing samples, which are stored in a text file.

6.4 Group Difference

In order to interpret the classification results, a discriminative direction method can be used [3, 4], to estimate the group differences between two groups. Since a leave-one-out validation is generally performed for testing the generalizability of COMPARE, the group difference can be constructed by averaging all group difference maps obtained from all leave-one-out cases. For each leave-one-out case, the group difference map is estimated by three steps as described next. First, for each support vector, look for its corresponding projection vector on the other side of separation hypersurface, by following the steepest gradient of classification function. The difference between this support vector and its corresponding projection vector on other side reflects changes on the selected regional features when a normal brain changes to the respective configuration in the patient group, or vice versa. Second, by summing up all regional differences calculated from all support vectors, an overall group difference vector can be obtained for the current leave-one-case under study. Finally, the group difference vector is mapped to its corresponding brain regions in the template space and subsequently added to other leave-one-out repetitions. The program used to estimate the group difference along the discriminative direction is

COMPARE_SVM_discriminate

usage: COMPARE_SVM_discriminate [model_file] [difference_file]
 

Options:

  <model file> ->  the SVM model
  <difference file>  -> file to store the difference vector

The group difference vector can be mapped to the brain regions from which the features are extracted in the template space by

COMPARE_map_svm_diff_to_region
Usage: COMPARE_map_svm_diff_to_region output_file_base feature_file order_file weighting_file feature_start_id
       feature_end_id region_file_base xsize ysize zsize

The parameters are:

output_file_base ->	              name of the outputs (each kind of feature maps has one regional map)
feature_file     ->	              name of file storing the information of feature extraction. 
order_file ->		              name of file storing the information of feature selection.
weighting_file-> 	              name of file storing the group difference vector
feature_start_id,feature_end_id ->    number of features used in classification. 
                                      Typically the start_id is 0, and the end_id is 
                                      (the number of features minus one).
region_file_base->	              name of files storing the brain regions
xsize ysize zsize->                   spatial dimension of feature maps

6.5 ROC curve

The ROC curve is generated by an implementation of the method presented in [5], which is implemented in COMPARE_build. The flowchart of this algorithm can be found in reference [5].

7. Reference

[1]C. Davatzikos, A. Genc, D. Xu, and S. M. Resnick, "Voxel-Based Morphometry Using the RAVENS Maps: Methods and Validation Using Simulated Longitudinal Atrophy," NeuroImage, vol. 14, pp. 1361-1369, 2001.

[2]D. Shen and C. Davatzikos, "HAMMER: Hierarchical attribute matching mechanism for elastic registration," IEEE Transactions on Medical Imaging, vol. 21, pp. 1421-1439, 2002.

[3]P. Golland, W. E. L. Grimson, M. E. Shenton, and R. Kikinis, "Deformation Analysis for Shape Based Classification," presented at the 17th International Conference on Information Processing in Medical Imaging, 2001.

[4]P. Golland, Fischl,B., Spiridon,M., Kanwisher,N., Buckner,R.L., Shenton,M.E., Kikinis,R., Dale,A., and Grimson,W.E.L, "Discriminative Analysis for Image-based Studies," presented at Fifth International Conference on Medical Image Computing and Computer Assisted Intervention, Tokyo, Japan, 2002.

[5]T. Fawcett, "ROC graphs: notes and practical considerations for data mining researchers," HP Laboratories Palo Alto HPL-2003-4, 01-07-2003 2003. 
