help

help >

**RE: Clarification on contrasts in CONN 2nd-level multivariate analysis**Oct 11, 2018 06:10 PM | Alfonso Nieto-Castanon -

*Boston University*RE: Clarification on contrasts in CONN 2nd-level multivariate analysis

Hi Martyn,

Thank you for the additional clarifications and scripts, you continue to raise rather interesting points. I would like to follow-up on your assertion that the two-stage approach is not optimal (it has lower sensitivity than an differently-framed partitioned errors approach) when a within-subjects factor has more than two levels. I say here "differently-framed" because, as far as I can tell, the two-stage approach in the Henson&Penny paper is simply an implementation of the same partitioned-errors approach as the one SPSS and many other software packages use (and I will try to convince you of this below).

First, regarding the scripts that you sent me, there were a couple of minor glitches there, if I am not mistaken, so just to get those out of the way (and please feel free to correct me if I am wrong):

1) in the computation of the two-level "Texture x Drink" statistics, I believe you have incorrectly defined the design matrix there. In particular, instead of:

X6 = blkdiag(ones(14,1), ones(14,1), ones(16,1), ones(16,1));

I believe it should read:

X6 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

2) similarly, in the computation of the two-level "Location x Texture" statistics, instead of:

X7 = blkdiag(ones(28,1),ones(32,1));

I believe it should read:

X7 = kron(ones(30,1),eye(2));

3) and last, in the computation of the two-level "Location x Texture x Drink" statistics, instead of:

X8 = blkdiag(ones(14,1), ones(14,1), ones(16,1), ones(16,1));

I believe it should read:

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

With those changes, the differences between the statistics reported by the two methods (SPSS-like implementation and CONN-like implementation) are not really as dramatic as they appeared originally. Out of the 7 effects tested, the two methods report as strongly significant only the two-way within-subjects "Location x Texture" interaction, and the reported F- values across these 7 tests are also rather similar (two are identical, three are higher with the two-level approach are two are higher with SPSS's approach). Because of this I remain unconvinced that the SPSS-like approach would actually be more sensitive than the two-stage approach that CONN uses (at least given the current evidence). I also believe that the differences that we are seeing in this example are actually

% SPSS-like

which reports F=1.1671, while a CONN-like approach would use something like:

% CONN-like

which reports F=0.7480. The difference between these two approaches stems solely, in my opinion, from CONN's having reduced explicitly the measures across the three-level to two between-level differences before entering those to spm_ancova. To be precise, a similar two-stage approach that does not reduce the dimensionality of the 3-levels "Texture" effect would be:

% CONN-like (but with 3 dimensions)

Y8 = kron(eye(30),kron([2 -1 -1;-1 2 -1;-1 -1 2],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(3));

[t8,df8] = spm_ancova(X8,eye(90),Y8,kron([1 -1],eye(3))')

which, unsurprisingly, reports F=1.1671 (just like the SPSS-like approach). A multivariate analysis of this 3-way interaction test would be exactly invariant to precisely this sort of differences between the two CONN-like approaches (the reduction from three linearly-dependent dimensions to two dimensions) so the reason why we are seeing those differences here must be related to the remaining dependencies between DVs. But clearly, one limitation/problem with all of these simulations here is that we have not considered that, in a real fMRI analysis scenario, the data would have been actually whitened before entering into the "spm_ancova" computation. If we go ahead and do that (of course in a real-world scenario this whitening comes from the estimation of the residual covariance across multiple voxels rather than a single-voxel, as we are considering in this example, but anyway, just for illustration purposes), we would have something like:

% SPSS-like (with actually-white residuals)

Y8 = kron(eye(30),kron(eye(3),[1 -1])) * Y;

X8 = [kron(blkdiag(ones(14,1),ones(16,1)),eye(3)) kron(eye(30),ones(3,1))];

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8, df8] = spm_ancova(X8,kron(eye(30),C),Y8,[1 -1 0 -1 1 0 zeros(1,30);0 1 -1 0 -1 1 zeros(1,30)]')

% CONN-like (with actually-white residuals)

Y8 = kron(eye(30),kron([-1 1 0;0 -1 1],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8,df8] = spm_ancova(X8,kron(eye(30),C),Y8,kron([1 -1],eye(2))')

% CONN-like (but with 3 dimensions with actually-white residuals)

Y8 = kron(eye(30),kron([2 -1 -1;-1 2 -1;-1 -1 2],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(3));

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8,df8] = spm_ancova(X8,kron(eye(30),C),Y8,kron([1 -1],eye(3))')

and now all three approaches report exactly the same F=1.1006 value.

So, summarizing, SPM's "two-stage" approach used by CONN is identical, as far as I can tell, as the standard partitioned-errors approach once the data has been whitened in order to account for the repeated-measures dependency structure (as SPM&CONN do). In addition, residual differences between the approaches likely reflect remaining departures from truly-white residuals, and as such are, in my opinion, unlikely to be systematic (i.e. sensitivity should not be consistently higher for one approach vs. the other). Please let me know your thoughts/comments/suggestions

I should also probably mention that I am of course aware that this is a somewhat "thorny"/complicated issue in the SPM list. SPM gives users the flexibility to implement an incredibly wide variety of analyses. This flexibility also means that users have to make a lot of perhaps-not-obvious choices when defining their analyses using SPM (particularly in the context of full- and flexible factorial models). Unavoidably, some of those choices lead to incorrectly-defined analyses. In contrast CONN, by design, offers a much more limited range of possible analyses that users can run. The advantage is that, for those limited analyses only, we have worked out what (hopefully) is a reasonable/general/appropriate way to implement them, so the chance of potential issues or mis-specifications is also considerably smaller. That is also why I am particularly interested in making sure that the multivariate and the two-stage univariate approaches that CONN offers for mixed-design models remain the best choices available, so I am always happy to contemplate alternatives as well as to follow-up on interesting thoughts such as the ones you have been raising in this thread.

Best

Alfonso

Thank you for the additional clarifications and scripts, you continue to raise rather interesting points. I would like to follow-up on your assertion that the two-stage approach is not optimal (it has lower sensitivity than an differently-framed partitioned errors approach) when a within-subjects factor has more than two levels. I say here "differently-framed" because, as far as I can tell, the two-stage approach in the Henson&Penny paper is simply an implementation of the same partitioned-errors approach as the one SPSS and many other software packages use (and I will try to convince you of this below).

First, regarding the scripts that you sent me, there were a couple of minor glitches there, if I am not mistaken, so just to get those out of the way (and please feel free to correct me if I am wrong):

1) in the computation of the two-level "Texture x Drink" statistics, I believe you have incorrectly defined the design matrix there. In particular, instead of:

X6 = blkdiag(ones(14,1), ones(14,1), ones(16,1), ones(16,1));

I believe it should read:

X6 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

2) similarly, in the computation of the two-level "Location x Texture" statistics, instead of:

X7 = blkdiag(ones(28,1),ones(32,1));

I believe it should read:

X7 = kron(ones(30,1),eye(2));

3) and last, in the computation of the two-level "Location x Texture x Drink" statistics, instead of:

X8 = blkdiag(ones(14,1), ones(14,1), ones(16,1), ones(16,1));

I believe it should read:

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

With those changes, the differences between the statistics reported by the two methods (SPSS-like implementation and CONN-like implementation) are not really as dramatic as they appeared originally. Out of the 7 effects tested, the two methods report as strongly significant only the two-way within-subjects "Location x Texture" interaction, and the reported F- values across these 7 tests are also rather similar (two are identical, three are higher with the two-level approach are two are higher with SPSS's approach). Because of this I remain unconvinced that the SPSS-like approach would actually be more sensitive than the two-stage approach that CONN uses (at least given the current evidence). I also believe that the differences that we are seeing in this example are actually

**only**due to the data not being really white/independent, so they really reflect differences in how the two approaches handle violations in their sphericity assumptions rather than intrinsic differences in the sensitivity/power of the two approaches. To illustrate that point, I will use as example the last three-way interaction tested (Location x Texture x Drink). Following your script notation, the SPSS-like approach would use something like:% SPSS-like

*Y8 = kron(eye(30),kron(eye(3),[1 -1])) * Y;*

X8 = [kron(blkdiag(ones(14,1),ones(16,1)),eye(3)) kron(eye(30),ones(3,1))];

[t8, df8] = spm_ancova(X8,eye(90),Y8,[1 -1 0 -1 1 0 zeros(1,30);0 1 -1 0 -1 1 zeros(1,30)]')X8 = [kron(blkdiag(ones(14,1),ones(16,1)),eye(3)) kron(eye(30),ones(3,1))];

[t8, df8] = spm_ancova(X8,eye(90),Y8,[1 -1 0 -1 1 0 zeros(1,30);0 1 -1 0 -1 1 zeros(1,30)]')

which reports F=1.1671, while a CONN-like approach would use something like:

% CONN-like

*Y8 = kron(eye(30),kron([-1 1 0;0 -1 1],[1 -1])) * Y;**X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));*

[t8,df8] = spm_ancova(X8,eye(60),Y8,kron([1 -1],eye(2))')

[t8,df8] = spm_ancova(X8,eye(60),Y8,kron([1 -1],eye(2))')

which reports F=0.7480. The difference between these two approaches stems solely, in my opinion, from CONN's having reduced explicitly the measures across the three-level to two between-level differences before entering those to spm_ancova. To be precise, a similar two-stage approach that does not reduce the dimensionality of the 3-levels "Texture" effect would be:

% CONN-like (but with 3 dimensions)

Y8 = kron(eye(30),kron([2 -1 -1;-1 2 -1;-1 -1 2],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(3));

[t8,df8] = spm_ancova(X8,eye(90),Y8,kron([1 -1],eye(3))')

which, unsurprisingly, reports F=1.1671 (just like the SPSS-like approach). A multivariate analysis of this 3-way interaction test would be exactly invariant to precisely this sort of differences between the two CONN-like approaches (the reduction from three linearly-dependent dimensions to two dimensions) so the reason why we are seeing those differences here must be related to the remaining dependencies between DVs. But clearly, one limitation/problem with all of these simulations here is that we have not considered that, in a real fMRI analysis scenario, the data would have been actually whitened before entering into the "spm_ancova" computation. If we go ahead and do that (of course in a real-world scenario this whitening comes from the estimation of the residual covariance across multiple voxels rather than a single-voxel, as we are considering in this example, but anyway, just for illustration purposes), we would have something like:

% SPSS-like (with actually-white residuals)

Y8 = kron(eye(30),kron(eye(3),[1 -1])) * Y;

X8 = [kron(blkdiag(ones(14,1),ones(16,1)),eye(3)) kron(eye(30),ones(3,1))];

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8, df8] = spm_ancova(X8,kron(eye(30),C),Y8,[1 -1 0 -1 1 0 zeros(1,30);0 1 -1 0 -1 1 zeros(1,30)]')

% CONN-like (with actually-white residuals)

Y8 = kron(eye(30),kron([-1 1 0;0 -1 1],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(2));

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8,df8] = spm_ancova(X8,kron(eye(30),C),Y8,kron([1 -1],eye(2))')

% CONN-like (but with 3 dimensions with actually-white residuals)

Y8 = kron(eye(30),kron([2 -1 -1;-1 2 -1;-1 -1 2],[1 -1])) * Y;

X8 = kron(blkdiag(ones(14,1),ones(16,1)),eye(3));

y = Y8-X8*(X8\Y8); y=reshape(y,[],30)'; C=y'*y; W=kron(eye(30),real(sqrtm(pinv(C)))); X8=W*X8;Y8=W*Y8;

[t8,df8] = spm_ancova(X8,kron(eye(30),C),Y8,kron([1 -1],eye(3))')

and now all three approaches report exactly the same F=1.1006 value.

So, summarizing, SPM's "two-stage" approach used by CONN is identical, as far as I can tell, as the standard partitioned-errors approach once the data has been whitened in order to account for the repeated-measures dependency structure (as SPM&CONN do). In addition, residual differences between the approaches likely reflect remaining departures from truly-white residuals, and as such are, in my opinion, unlikely to be systematic (i.e. sensitivity should not be consistently higher for one approach vs. the other). Please let me know your thoughts/comments/suggestions

I should also probably mention that I am of course aware that this is a somewhat "thorny"/complicated issue in the SPM list. SPM gives users the flexibility to implement an incredibly wide variety of analyses. This flexibility also means that users have to make a lot of perhaps-not-obvious choices when defining their analyses using SPM (particularly in the context of full- and flexible factorial models). Unavoidably, some of those choices lead to incorrectly-defined analyses. In contrast CONN, by design, offers a much more limited range of possible analyses that users can run. The advantage is that, for those limited analyses only, we have worked out what (hopefully) is a reasonable/general/appropriate way to implement them, so the chance of potential issues or mis-specifications is also considerably smaller. That is also why I am particularly interested in making sure that the multivariate and the two-stage univariate approaches that CONN offers for mixed-design models remain the best choices available, so I am always happy to contemplate alternatives as well as to follow-up on interesting thoughts such as the ones you have been raising in this thread.

Best

Alfonso

*Originally posted by Martyn McFarquhar:*Hi Alfonso,

Thank you for the clarification and recommendations on the 2nd-level MVPA approach. I will pass this information on to my collaborators.

I should apologise first of all as my use of the word "correct" in terms of the F-statistic is not really accurate as you are indeed right, the whitening will guarantee (when all other assumptions are valid) that the test statistic will follow an F-distribution under the null. This is confirmed by your simulations.

The question is really one about the most appropriate F-statistic to use to test specific hypotheses, which comes down to the issue of partitioned vs pooled errors. The majority of statistical packages use the traditional "split-plot" approach for testing hypotheses in repeated-measures models because the tests are more sensitive. As such, we would do well in neuroimaging to take the same approach. I would hazard that this is the expectation of most researchers, namely that their models can match the models implemented in other packages.

Unfortunately, as far as I can tell, the two-stage approach advocated in the SPM chapter you indicated, and one the SPM Wiki (https://en.wikibooks.org/wiki/SPM/Group_...) doesn't agree with other stats packages when the within-subject factor has > 2 levels. Correct partitioned errors seem to be only achievable through the use of the flexible factorial approach (using a full over-parameterised design) or through careful combination of first-level contrasts and first-level averages.

I have attached a script and some data demonstrating this. The models looking at within-subject effects with 2-levels can actually be corrected by using two-sample t-test models for all comparisons (rather than the one-sample models advocated on the Wiki), however there is no such simple fix for those with > 3 levels, as far as I can tell. The models given for the portioned error examples are actually simplifications of the more complex over-parameterised models that I discuss in a recent preprint (https://psyarxiv.com/a5469/) but the principles are still the same.

In terms of the approach for CONN, my feeling would be that partitioned-errors are still necessary when looking across components at the second-level. Taking through differential effects of the second within-subject factor is akin to the approach advocated on the wiki, which does not result in the same statistics as produced by standard packages. The fact that you are looking across components rather than between components I don't believe should make a difference to how the data are modelled. My feeling is that the same error term that you would use for looking between components should be used when looking across components, as you are still looking at a within-subject effect (although I'd be interested to hear your take on this).

This is still a thorny issue in the community as there are questions about it almost weekly on the SPM list. I'd be interested to hear your take on this, although I realise we've gone beyond the original remit of my questions!

Best wishes,

- Martyn

Thank you for the clarification and recommendations on the 2nd-level MVPA approach. I will pass this information on to my collaborators.

I should apologise first of all as my use of the word "correct" in terms of the F-statistic is not really accurate as you are indeed right, the whitening will guarantee (when all other assumptions are valid) that the test statistic will follow an F-distribution under the null. This is confirmed by your simulations.

The question is really one about the most appropriate F-statistic to use to test specific hypotheses, which comes down to the issue of partitioned vs pooled errors. The majority of statistical packages use the traditional "split-plot" approach for testing hypotheses in repeated-measures models because the tests are more sensitive. As such, we would do well in neuroimaging to take the same approach. I would hazard that this is the expectation of most researchers, namely that their models can match the models implemented in other packages.

Unfortunately, as far as I can tell, the two-stage approach advocated in the SPM chapter you indicated, and one the SPM Wiki (https://en.wikibooks.org/wiki/SPM/Group_...) doesn't agree with other stats packages when the within-subject factor has > 2 levels. Correct partitioned errors seem to be only achievable through the use of the flexible factorial approach (using a full over-parameterised design) or through careful combination of first-level contrasts and first-level averages.

I have attached a script and some data demonstrating this. The models looking at within-subject effects with 2-levels can actually be corrected by using two-sample t-test models for all comparisons (rather than the one-sample models advocated on the Wiki), however there is no such simple fix for those with > 3 levels, as far as I can tell. The models given for the portioned error examples are actually simplifications of the more complex over-parameterised models that I discuss in a recent preprint (https://psyarxiv.com/a5469/) but the principles are still the same.

In terms of the approach for CONN, my feeling would be that partitioned-errors are still necessary when looking across components at the second-level. Taking through differential effects of the second within-subject factor is akin to the approach advocated on the wiki, which does not result in the same statistics as produced by standard packages. The fact that you are looking across components rather than between components I don't believe should make a difference to how the data are modelled. My feeling is that the same error term that you would use for looking between components should be used when looking across components, as you are still looking at a within-subject effect (although I'd be interested to hear your take on this).

This is still a thorny issue in the community as there are questions about it almost weekly on the SPM list. I'd be interested to hear your take on this, although I realise we've gone beyond the original remit of my questions!

Best wishes,

- Martyn

## Threaded View

Title | Author | Date |
---|---|---|

Martyn McFarquhar |
Oct 4, 2018 | |

Alfonso Nieto-Castanon |
Oct 5, 2018 | |

Martyn McFarquhar |
Oct 5, 2018 | |

Alfonso Nieto-Castanon |
Oct 5, 2018 | |

Martyn McFarquhar |
Oct 8, 2018 | |

Alfonso Nieto-Castanon |
Oct 8, 2018 | |

Martyn McFarquhar |
Oct 9, 2018 | |

Alfonso Nieto-Castanon |
Oct 9, 2018 | |

Ali Amad |
Oct 11, 2018 | |

Alfonso Nieto-Castanon |
Oct 12, 2018 | |

Ali Amad |
Oct 19, 2018 | |

Martyn McFarquhar |
Oct 11, 2018 | |

Alfonso Nieto-Castanon |
Oct 11, 2018 | |

Martyn McFarquhar |
Oct 12, 2018 | |

Alfonso Nieto-Castanon |
Oct 12, 2018 | |

Martyn McFarquhar |
Oct 15, 2018 | |

Martyn McFarquhar |
Oct 5, 2018 | |