help > Multivariate regression - assumptions?
Showing 1-9 of 9 posts
Display:
Results per page:
Oct 11, 2015  06:10 PM | Jenna Traynor - McMaster University
Multivariate regression - assumptions?
Hello, 

I am inquiring about the assumptions of normality and heteroscedasticity of residuals, specifically - when running a multivariate regression using the CONN toolbox, are these assumptions already controlled for in the second-level results?

I have used the toolbox to look at the linear association between 4 clinical variables and FC within a group of subjects. I am wondering if the significant results that I am seeing have normally distributed residuals and if the residuals have equal variances - is there a way for me to extract residuals to examine/plot them or has this already been taking into account?

Likewise - are there any papers that I can reference that explain how CONN toolbox deals with these assumptions?

Many thanks, 

Jenna
Oct 11, 2015  07:10 PM | Alfonso Nieto-Castanon - Boston University
RE: Multivariate regression - assumptions?
Hi Jenna,

Could you please clarify the specific form of your second-level analyses? There are at least two main ways in which you could be looking at the association between 4 clinical variables and FC, either using FC as a predictor variable and clinical scores as outcome variables (e.g. in Tools.Calculator; I am guessing you might be using this since you mention multivariate regression) or using clinical scores as predictors and FC as an outcome variable (e.g. in Tools.Calculator or in the second-level results tab; my guess is that you might not be using this since this would be just a multiple regression instead). In general, CONN uses the standard General Linear Model for all second-level analyses, so the typical normality/heteroscedasticity assumptions apply, but there is no explicit testing/checking in CONN of the validity of any of these assumptions in your data. If I am understanding correctly your analyses, the "equal variance" assumption does not really apply to these analyses (either because you are using a single FC outcome variable on a single group of subjects, or because you are using a multivariate test with multiple clinical score outcome variables -and in this case the clinical scores variance/covariances are explicitly modeled by the multivariate test). In any way, if you let me know more details about your case perhaps I can give you a more specific answer and/or suggest ways to test the standard normality/heteroscedasticity assumptions. 

Hope this helps
Alfonso
Originally posted by Jenna Traynor:
Hello, 

I am inquiring about the assumptions of normality and heteroscedasticity of residuals, specifically - when running a multivariate regression using the CONN toolbox, are these assumptions already controlled for in the second-level results?

I have used the toolbox to look at the linear association between 4 clinical variables and FC within a group of subjects. I am wondering if the significant results that I am seeing have normally distributed residuals and if the residuals have equal variances - is there a way for me to extract residuals to examine/plot them or has this already been taking into account?

Likewise - are there any papers that I can reference that explain how CONN toolbox deals with these assumptions?

Many thanks, 

Jenna
Oct 12, 2015  02:10 AM | Jenna Traynor - McMaster University
RE: Multivariate regression - assumptions?
Hi Alfonso, 

Thank you for your response. I am using 4 clinical scores of behaviour (labelled C1, C2, C3 and C4) as multiple predictor variables and FC is the outcome variable. However, I am a bit confused because I am looking at how each of these clinical scores is associated with FC in multiple source ROIs. So because I have more than one outcome variable (i.e., multiple ROIs) I thought it was a mutlivariate regression?

The way that I set this up is based on the fact that I initially had two groups (ASD and controls), and this was a within group analysis on ASD subjects only: I chose 'multivariate regression' in the 1st level analysis tab, and then when viewing in second level results I selected the following variables : ASD, C1, C2, C3, C4 and set up the contrast for example as [0, 1, 0, 0, 0] - to look at the association between FC and C1, (while controlling for the association between FC and all of the other clinical variables?). As I move through the source ROIs on the right, I can see the association between the selected predictor variable and connectivity between that source ROI and all target ROIS (brodmann areas). 

So I guess my question was if there was any way to extract the residuals from this model in order to examine the validity of the assumptions. Is there reason to think that the homoscedasticity assumption does not apply here? Would I want the residual errors of observed - predicted Fisher-Z values to be homogenous across all clinical variables?

Thank you so much for your help, and please let me know if I have done something incorrectly. 

Jenna
Oct 14, 2015  12:10 AM | Alfonso Nieto-Castanon - Boston University
RE: Multivariate regression - assumptions?
Hi Jenna,

Some thoughts on your analyses first:

1) the choice of 'multivariate regression' in the first-level analysis tab is doing perhaps something different than what you intended. These measures pertain to the meaning of the connectivity measures estimated separately for each subject (subject-level / first-level analysis). In particular when choosing multivariate regression or semipartial correlation measures (e.g. instead of bivariate regression or bivariate correlation measures), CONN will be computing, for each seed ROI, the unique connectivity between this seed ROI and every target region after controlling for the contribution of all other seed ROIs. In other words, separately for each subject and for each target area/voxel, CONN is using a multiple regression model where it enters all of the seed/source ROI timeseries that you included in the 'sources' list as simultaneous regressors. Compared to that, when you use bivariate regression/correlation measures, CONN is using a separate regression model for each seed/source ROI (where only a single seed/source ROI timeseres is included as a regressor when fitting each target area/voxel). So, unless you have a reason to be interested in the unique connectivity associated with each seed ROI I would suggest to use bivariate correlation/regression measures instead in your first-level analysis model. 

2) just to make sure: if your conn project includes both ASD and control subjects, and you wish to look at the association between any of your clinical score variables C1 to C4 with FC only within ASD subjects, you should:

   a) make sure that the C1 to C4 second-level covariates contain zero values for all of the control subjects
   b) have a 'ASD' and 'control' second-level covariates dummy-coding your subject groups (e.g. 'ASD' covariates contains 1's for your ASD subjects and 0's for controls)
and c) in the second-level results tab select 'ASD', 'C1' ... 'C4' in the subject-effects list and enter a between-subjects contrast [0 1 0 0 0; 0 0 1 0 0; 0 0 0 1 0; 0 0 0 0 1]  

Coming back to your original question, the results of this test will be those areas where FC is associated with any of your C1-C4 covariates within the ASD subject group. This test is implemented as a second-level multivariate F-test (these analyses do not assume that the Fisher-transformed Z-values are homogeneous across your C1-C4 clinical variables; for seed-to-voxel analyses the variance/covariance between your C1-C4 regressors is estimated in an initial resML step, and for ROI-to-ROI analyses that same variance/covariance structure is explicitly estimated for each target ROI).

The residuals from these models are not explicitly saved anywhere, unfortunately. Perhaps a not terribly complicated way to obtain those residuals would be to:

   a) first use 'extract values' to get the actual ROI-to-ROI or seed-to-voxel connectivity values of interest for each subject (e.g. into a new second-level covariate variable named 'FC')

 and b) explicitly create a new second-level covariate (e.g. named 'residuals') and enter in its values field the following:

    FC - FC / [ASD;C1;C2;C3;C4] * [ASD;C1;C2;C3;C4]

(change the 'FC', 'ASD', and 'C1' to 'C4' entries above to the actual names that you used in your CONN project for those second-level covariates). The new 'residuals' covariate will contain the residuals (one value per subject) from the second-level model used in your original multivariate test.

Hope this helps
Alfonso





Originally posted by Jenna Traynor:
Hi Alfonso, 

Thank you for your response. I am using 4 clinical scores of behaviour (labelled C1, C2, C3 and C4) as multiple predictor variables and FC is the outcome variable. However, I am a bit confused because I am looking at how each of these clinical scores is associated with FC in multiple source ROIs. So because I have more than one outcome variable (i.e., multiple ROIs) I thought it was a mutlivariate regression?

The way that I set this up is based on the fact that I initially had two groups (ASD and controls), and this was a within group analysis on ASD subjects only: I chose 'multivariate regression' in the 1st level analysis tab, and then when viewing in second level results I selected the following variables : ASD, C1, C2, C3, C4 and set up the contrast for example as [0, 1, 0, 0, 0] - to look at the association between FC and C1, (while controlling for the association between FC and all of the other clinical variables?). As I move through the source ROIs on the right, I can see the association between the selected predictor variable and connectivity between that source ROI and all target ROIS (brodmann areas). 

So I guess my question was if there was any way to extract the residuals from this model in order to examine the validity of the assumptions. Is there reason to think that the homoscedasticity assumption does not apply here? Would I want the residual errors of observed - predicted Fisher-Z values to be homogenous across all clinical variables?

Thank you so much for your help, and please let me know if I have done something incorrectly. 

Jenna
Oct 14, 2015  12:10 PM | Jenna Traynor - McMaster University
RE: Multivariate regression - assumptions?
Hi Alfonso, 

Thank you so much for this clarification. Would it then be accurate to say that using multivariate regression is better when looking for the unique contribution of each ROI that is within a known network? For example, if I was looking at the DMN and wanted to see the unique contribution of the PCC after controlling for the contribution of the MPFC, LLP, etc, then it would be better to employ multivariate methods? But if examining a number of different a priori ROIs that are not necessarily a part of a FC network (as in my study; ie., insula, hippocampus, putamen) then it would be better to use separate regression models for each ROI (i.e., bivariate)?

With regard to the way I set it up, yes I did employ the steps that you suggested by using dummy-coding and making C1-C4 variables all 0 for control subjects.

However after doing step c) in the second-level results tab select 'ASD', 'C1' ... 'C4' in the subject-effects list and enter a between-subjects contrast [0 1 0 0 0; 0 0 1 0 0; 0 0 0 1 0; 0 0 0 0 1]

... would I scroll through the source ROIs on the right separately to see how my predictor variables are associated with each ROI? Or together, since the results are only giving me connectivity associated with ANY of the four C1 - C4 variables?

And for future - if I did ever want to employ multivariate regression would I also set up the second-level results in the same way? ie [0,1,0,0,0; 0,0,1,0,0; 0,0,0,1,0; 0,0,0,0,1]

Thank you so so much, this has been so helpful!

Jenna
Oct 14, 2015  05:10 PM | Alfonso Nieto-Castanon - Boston University
RE: Multivariate regression - assumptions?
Hi Jenna,

Some thoughts on your questions below
Best
Alfonso
Originally posted by Jenna Traynor:
Hi Alfonso, 

Thank you so much for this clarification. Would it then be accurate to say that using multivariate regression is better when looking for the unique contribution of each ROI that is within a known network? For example, if I was looking at the DMN and wanted to see the unique contribution of the PCC after controlling for the contribution of the MPFC, LLP, etc, then it would be better to employ multivariate methods? But if examining a number of different a priori ROIs that are not necessarily a part of a FC network (as in my study; ie., insula, hippocampus, putamen) then it would be better to use separate regression models for each ROI (i.e., bivariate)?

Yes, exactly.

With regard to the way I set it up, yes I did employ the steps that you suggested by using dummy-coding and making C1-C4 variables all 0 for control subjects.

However after doing step c) in the second-level results tab select 'ASD', 'C1' ... 'C4' in the subject-effects list and enter a between-subjects contrast [0 1 0 0 0; 0 0 1 0 0; 0 0 0 1 0; 0 0 0 0 1]

... would I scroll through the source ROIs on the right separately to see how my predictor variables are associated with each ROI? Or together, since the results are only giving me connectivity associated with ANY of the four C1 - C4 variables?

Sorry I missed the part in your original question where you had mentioned the multiple seed ROIs. If you want to look at each seed separatey then you would simply select each seed ROI individually in the 'sources' list, while if you want to look at the effects across any of your seeds (appropriately controlling for the number of seed ROIs) then you would select all of them simultaneously and enter an eye(N) contrast (the default contrast when selecting multiple seeds). If the number of seed ROIs is relatively large compared to your sample size then these analyses might not have sufficient degrees of freedom or sufficient power. In that case you may click on the 'results explorer' button, select there your multiple seed ROIs in the 'sources' list and use one of the additional thresholding options available there (e.g. seed-level FDR-corrected stats, network based statistics, etc.)

And for future - if I did ever want to employ multivariate regression would I also set up the second-level results in the same way? ie [0,1,0,0,0; 0,0,1,0,0; 0,0,0,1,0; 0,0,0,0,1]

Yes, exactly. In general, the way to define the second-level analyses is not affected by the choice of first-level connectivity measure entered in these analyses (e.g. bivariate/multivariate regression/correlation seed-to-voxel or ROI-to-ROI measures, but also PPI, voxel-to-voxel, graph measures, etc. after estimating any of those connectivity measures for each subject in the first-level analysis step they can then be analyzed across subjects using the same second-level general linear model definitions/contrasts) 

Hope this helps
Alfonso
Oct 26, 2015  02:10 PM | Jenna Traynor - McMaster University
RE: Multivariate regression - assumptions?
Hi Alfonso, 

Thank you again for your help. Regarding your last post, I just have a couple more questions. To remind you I am using a bivariate regression to look at the relationship between four clinical variables C1 - C4 and connectivity in a group of subjects with ASD (dummy coded, as I had a control group in a previous analysis). I also have multiple seed ROIs.

I did what you said and selected 'ASD' 'C1' 'C2' 'C3' 'C4' and entered [0 1 0 0 0; 0 0 1 0 0; 0 0 0 1 0; 0 0 0 0 1] in the between subjects contrast. Then, because I have multiple ROIs, I selected them all in the between sources contrast and left the contrast as eye(N), to appropriately control for the amount of ROIs I have. 

However, this tests the effects across any of my seeds with any of the C1 - C4 variables. I am specifically interested in knowing how connectivity differs between C1 - C4, as well as specifically which ROIs are contributing to these differences. Is there another contrast I can enter to look at this?

Thank you, 

Jenna
Oct 28, 2015  05:10 PM | Jenna Traynor - McMaster University
Multivariate F test - post hoc
Hi Alfonso, 

After posting my last question, I have figured out how to look at the contribution of each of my clinical variables alone. I now have a question regarding whether my data are corrected for multiple comparisons, at the post hoc level. 

To remind you....I am using a multivariate F test (bivariate regression) to look at how 4 clinical variables (C1, C2, C3 and C4) can predict FC in a number of regions of interest.

I selected 'ASD' 'C1' 'C2' 'C3' 'C4' and entered [0 1 0 0 0; 0 0 1 0 0; 0 0 0 1 0; 0 0 0 0 1] in the between subjects contrast. Then, after importing values, and navigating to the tools calculator, because I have multiple source ROIs, I selected them all simultaneously to appropriately control for the amount of comparisons.

I now have several main effects, however, in order to see where these main effects are coming from , I am looking at the simple main effect of each of my clinical variables alone by selecting "ASD" and "C1" and entering the contrast [0, 1] in the tools calculator window. I am wondering if I need to apply additional correction here by using the Bonferroni correction and dividing my p value by the number of comparisons (i.e., outcome variables) I have?

Additionally, if I see a significant main effect, but when examining the simple main effect, it does not meet significance with the Bonferroni correction - how can I interpret this? 

Thank you as always, 

Jenna
Oct 1, 2025  12:10 PM | Shruti Kinger
RE: Multivariate regression - assumptions?

Hello Dr. Alfonso,


I have a query regarding the interpretation of second-level analysis results. I have behavioural variables as predictors and the functional connectivity as the outcome variable for the second-level analysis. Could you explain how the interpretation of second-level results would vary if I choose multivariate regression instead of bivariate regression in the first-level analysis tab? Could you provide a specific example where multivariate regression would be a better approach?


 I am using the Jülich Brain Atlas, where each region is parcellated into multiple subregions (e.g., the Insula is parcellated into 32 subregions). In such a case, would multivariate regression be more useful? 


If I enter 150 such regions from the Julich Brain Atlas in sources/seeds/ROI, will it control for the effect of all 149 seeds and show unique connectivity for just one region? Is entering 150 regions as ROIs of interest a good idea, given that the p-value may not survive after controlling for the number of comparisons?


Thank you for your time


Shruti