help > RE: Clarification on contrasts in CONN 2nd-level multivariate analysis
Oct 11, 2018  08:10 AM | Ali Amad
RE: Clarification on contrasts in CONN 2nd-level multivariate analysis
Dear Alfonso,

first of all, thank you for help on this topic and for all the clarifications.

I have worked with Martyn's team on these analysis. Following MVPA analysis, we have indeed some significant results by using parametric statistics and nothing with non-parametric stats. However, by using the results from parametric stats as seeds to run seed to voxel analysis we found several interesting results.

Do you think that all these results are worthless and cannot be used at all ? Or can these results can be discussed with caution ?

Many thanks for all you help.

Best wishes,

Ali

Originally posted by Alfonso Nieto-Castanon:
Hi Martyn,

Thank you for the clarification (and no need to stop the questions, I appreciate the interesting points your are bringing up)

Regarding the easy portion of your question, in this particular case (MVPA analyses) the best way to proceed, in my opinion, would be to use the multivariate (non-parametric) analysis version in CONN. Beyond the point regarding the correct degrees of freedom (more on this below) the main problem for univariate SPM analyses in the context of MVPA is that the components entered into your second-level analysis are being computed using an entirely different Principal Component decomposition for each voxel, which means that, even if for nearby voxels the subspace spanned by those components is expected to be similar, the identity of the components is not (i.e. the same component across two adjacent voxels cannot be guaranteed to represent the same or similar aspect of the pattern of connectivity with the rest of the brain), so the standard SPM assumptions in second-level analyses (including homogeneity of covariance across voxels as well as random field smoothness assumptions) are not all that reasonable in this case. Hence the recommendation here is to use multivariate analyses together with non-parametric cluster statistics (i.e simply selecting "non-parametric stats" in the results explorer window). 

Now, coming back to the more difficult but perhaps also more interesting portion of your question, CONN does NOT include a subject factor explicitly in the second-level design matrix, even in the context of these repeated-measure analyses. I do not believe it would be appropriate to do so, for the following reasons (but please feel free to correct me and/or let me know your thoughts): 1) in the example analysis that you describe, subject effects (average MVPA component values for each component pre- and post- intervention) are already explicitly excluded by computing the "post - pre" differences in component scores, separately for each subject, and entering those differences directly (instead of entering separately both the pre- and post- scores) into the second-level analyses; 2) the same is not done across MVPA components (i.e. entering the difference in scores between components) because we are not interested in testing whether there are any differences between the 3 MVPA component scores, but rather whether there are any effects when pooling across the 3 components; and 3) as far as I understand the ReML pooled estimation of the covariance between the DVs, when the within-subject factor is modeled as having non-independent levels, is followed by a whitening step, which (I guess assuming that the covariance is not rank-defficient) should effectively remove the dependencies between the DVs (implicitly making the degrees of freedom of the F statistic used in these analyses correct).

To illustrate the latter point (and for me to double-check what I am saying here, since I could be mistaken) I am including below some test code that simulates random data for 10 subjects and 3 non-independent DVs (e.g. simulating the three "differences in MVPA scores" variables in our example analyses), and then it runs the "univariate (SPM)"-style second-level repeated-measures analysis without the subjects factor included in the design matrix (just like CONN does when using the "univariate (SPM)" option). The point here is to illustrate that, without the whitening operation (i.e. if you were to change "W=..." in the code below to "W=eye(L*N)", in order to skip whitening) you would be exactly right that the F- statistics computed there would be incorrect and have the incorrect degrees of freedom (this is shown by the code producing non-uniform distribution of p-values in this case, and it is, as far as I can tell, simply the result of the F-test incorrect assumption about the 30 data samples in there being independent), but with the whitening operation the distribution of p-values is now correct (i.e. uniform distribution under the null hypothesis) indicating that the degrees of freedom being used in these analyses (F(3,27), rather than the F(3,9) degrees of freedom that you would get for a multivariate version of this same analysis or with the inclusion of subject factors) are correct.

In any way, please let me know your thoughts, and/or if I am misinterpreting your question (and of course I would be more than happy to hear more / learn about why the two-stage approach in SPM might not provide the correct F-statistics or degrees of freedom in the context of a within-subject factor with more than two levels), and hope this helps

Alfonso

-----------------------------------
N = 10; % number of subjects
L = 3;   % number of DVs/components

% prepares design and data
X = kron(eye(L),ones(N,1));               % design matrix
C = eye(L);                                       % contrast of interest
A = randn(L); A=chol(eye(L)+A'*A);   % cholesky factor (dependence between DVs)
W = kron(sqrtm(inv(A'*A)),eye(N));    % whitening matrix
X = W*X;                                          % whitened design matrix

% runs simulations (second-level analyses with null data)
dofe = size(X,1)-rank(X);                   % denominator degrees of freedom
Nc0 = rank(X*C');                              % numerator degrees of freedom
iX = pinv(X'*X);
ir = pinv(C*iX*C');
iXX = iX*X';
FF = [];
for niter=1:1e6,                                 % number of simulations
  Y = randn(N,L)*A;                            % simulated data
  Y = W*reshape(Y,[],1);                    % data concatenated across DVs and whitened
  B = iXX*Y;                                       % model parameters
  E = Y-X*B;                                       % model error
  EE = sum(abs(E).^2,1);
  h = C*B;
  BB = sum(h.*(ir*h),1);
  F = real(BB./max(eps,EE))*dofe/Nc0;  % F-statistic
  FF(niter) = F;
end

% display null distribution of p-values (expected uniform distribution if valid analysis)
p=1-spm_Fcdf(FF,Nc0,dofe);
hist(p,100)

-------------------------------------------------

Originally posted by Martyn McFarquhar:
Hi Alfonso,

Thanks again for the detailed response, I promise the questions will stop soon!

Just to contextualise, we are still trying to resolve the discrepancy in results and my collaborator informs me that they didn't select the "non-parametric" option in CONN and as such I am assuming that the results they are reporting come from the univariate SPM approach. As such, I am just trying to understand this method fully so we can decide what to do.

I just want to check about the implementation of the "two-stage" approach as the guidance given in the attached paper (and by Will Penny on the SPM Wiki) do not actually give the correct F-statistics or degrees of freedom when you have a within-subject factor with > 2 levels. I have been in discussions with Guillaume Flandin at the FIL about this and I can send you scripts and examples to show this if you want.

For now I just want to clarify how the model works in CONN because in order to get an equivalent to the multivariate test across components (i.e. when the M contrast matrix is the identity) you would still need multiple components in the same model and thus would still require the inclusion of the subject factor in the design matrix in order to get the correct degrees of freedom and compute the correct error term for the F-tests. Is this being done in CONN?

Best wishes,
Martyn

Threaded View

TitleAuthorDate
Martyn McFarquhar Oct 4, 2018
Alfonso Nieto-Castanon Oct 5, 2018
Martyn McFarquhar Oct 5, 2018
Alfonso Nieto-Castanon Oct 5, 2018
Martyn McFarquhar Oct 8, 2018
Alfonso Nieto-Castanon Oct 8, 2018
Martyn McFarquhar Oct 9, 2018
Alfonso Nieto-Castanon Oct 9, 2018
RE: Clarification on contrasts in CONN 2nd-level multivariate analysis
Ali Amad Oct 11, 2018
Alfonso Nieto-Castanon Oct 12, 2018
Ali Amad Oct 19, 2018
Martyn McFarquhar Oct 11, 2018
Alfonso Nieto-Castanon Oct 11, 2018
Martyn McFarquhar Oct 12, 2018
Alfonso Nieto-Castanon Oct 12, 2018
Martyn McFarquhar Oct 15, 2018
Martyn McFarquhar Oct 5, 2018