help > Help with design matrix and contrast for 4x3 model
Showing 1-4 of 4 posts
Display:
Results per page:
Sep 3, 2021  07:09 PM | Alexandra Anagnostopoulou
Help with design matrix and contrast for 4x3 model
Dear Dr. Zalesky,

I have a dataset that consists of 3 separate groups (HC,SZ,MDD) recorded in 4 different sites, measuring the conditions DL and DW (in total 131 subjects). I am trying to test the effect of the sites as well as their interaction with the diagnosis (with age and gender as covariates). My design matrix is:

Design Matrix
----------------
1 1 0 1 0 0 0 25 1 1 0 0 0
1 1 0 0 0 1 0 30 0 0 1 0 0
1 1 0 0 1 0 0 39 1 0 0 1 0
1 1 0 0 0 0 1 40 0 0 0 0 1
1 1 0 1 0 0 0 25 1 1 0 0 0
1 1 0 0 0 1 0 30 0 0 1 0 0

...

1 0 1 1 0 0 0 26 1 1 0 0 0
1 0 1 0 1 0 0 45 0 0 1 0 0
1 0 1 0 0 1 0 50 0 0 0 1 0
1 0 1 0 0 0 1 41 0 0 0 0 1
1 0 1 1 0 0 0 26 1 1 0 0 0
1 0 1 0 1 0 0 45 0 0 1 0 0
...

1 0 0 1 0 0 0 32 1 1 0 0 0
1 0 0 1 0 0 0 39 1 0 1 0 0
1 0 0 0 1 0 0 50 0 0 0 1 0
1 0 0 0 0 0 1 48 0 0 0 0 1
1 0 0 0 1 0 0 32 1 1 0 0 0
1 0 0 0 1 0 0 39 1 0 1 0 0

with the col 1: global mean, cols 2-3: group 1&2, cols 4-7: site 1-4, col 8: age, col 9: gender, remaining cols modelling observations from the same subject.

1. Is the design matrix correct for what I'm trying to test? Should I not include the last columns that mark the observations from the same subject? (Cause in a simple ANOVA (without sites-variables and covariates) I get a warning/error for rank reficient)


2. What is the appropriate contrast to test the differences between sites and/or the sites interaction with the diagnosis(group)?? Should it be along the lines of [0,1,-1,1,1,1,0...]?
Sep 3, 2021  11:09 PM | Andrew Zalesky
RE: Help with design matrix and contrast for 4x3 model
Hi Alexandra, 

the design matrix is over-parameterized and will give a rank deficient warning. You will need to remove one of the columns corresponding to site. So you should only need to model 3 of the 4 sites. It does not matter which site column is removed - the results will be the same irrespective of which site column is removed. Note that if you have information about 3 of the 4 sites, the identity of the 4th site can be inferred. 

If you have repeated measurements for all subjects and within-subject means are modelled, the first column of 1's (global mean) should also be removed. It wasn't clear to me if you have repeated measures for all subjects, or just a few. If you only have repeated measurements for a few subjects, the columns of 1's would still be needed. 

To test for an effect of site, you would use a contrast of the form [0 0 0 1 1 1 0 0 0 ....], where the 1's are positioned at the location of the three site variables. Use the F-test. Note that this will only test for a linear effect of size, and it is important to bear in mind that site effects can be non-linear. 

To test for a site-by-diagnosis interaction, you would need to add additional columns to your design matrix. The would involved multiplying (elementwise) the diagnosis and site columns to generate new regressors. 

best,
Andrew



Originally posted by Alexandra Anagnostopoulou:
Dear Dr. Zalesky,

I have a dataset that consists of 3 separate groups (HC,SZ,MDD) recorded in 4 different sites, measuring the conditions DL and DW (in total 131 subjects). I am trying to test the effect of the sites as well as their interaction with the diagnosis (with age and gender as covariates). My design matrix is:

Design Matrix
----------------
1 1 0 1 0 0 0 25 1 1 0 0 0
1 1 0 0 0 1 0 30 0 0 1 0 0
1 1 0 0 1 0 0 39 1 0 0 1 0
1 1 0 0 0 0 1 40 0 0 0 0 1
1 1 0 1 0 0 0 25 1 1 0 0 0
1 1 0 0 0 1 0 30 0 0 1 0 0

...

1 0 1 1 0 0 0 26 1 1 0 0 0
1 0 1 0 1 0 0 45 0 0 1 0 0
1 0 1 0 0 1 0 50 0 0 0 1 0
1 0 1 0 0 0 1 41 0 0 0 0 1
1 0 1 1 0 0 0 26 1 1 0 0 0
1 0 1 0 1 0 0 45 0 0 1 0 0
...

1 0 0 1 0 0 0 32 1 1 0 0 0
1 0 0 1 0 0 0 39 1 0 1 0 0
1 0 0 0 1 0 0 50 0 0 0 1 0
1 0 0 0 0 0 1 48 0 0 0 0 1
1 0 0 0 1 0 0 32 1 1 0 0 0
1 0 0 0 1 0 0 39 1 0 1 0 0

with the col 1: global mean, cols 2-3: group 1&2, cols 4-7: site 1-4, col 8: age, col 9: gender, remaining cols modelling observations from the same subject.

1. Is the design matrix correct for what I'm trying to test? Should I not include the last columns that mark the observations from the same subject? (Cause in a simple ANOVA (without sites-variables and covariates) I get a warning/error for rank reficient)


2. What is the appropriate contrast to test the differences between sites and/or the sites interaction with the diagnosis(group)?? Should it be along the lines of [0,1,-1,1,1,1,0...]?
Sep 8, 2021  05:09 PM | Alexandra Anagnostopoulou
RE: Help with design matrix and contrast for 4x3 model
Dear Dr Zalesky,

Thank you so much for the swift reply!!

I have implemented what you have previously suggested but I am still having some issues so I was wondering if I could pick your brain regarding these matters

Q1. In relation to the previous matter, to answer your question: Yes all of the subjects have repeated measures. I have removed the global mean and the column representing one of the sites and done an F-test with the contrast [0,0,1,0,0,...], as you have suggested but I am still having the "rank deficient" warning. I played around with the model and found out that as soon as I declare the repeated measures that's when the warning is shown, but I have no idea what I'm doing wrong or how to fix it. As I previously said, I have three groups (HC,SZ,MDD) with 2 conditions (DL,DW) with subjects from 4 different sites (Site 1-4). In this model I'm trying to see the effect of the site using the grouping, age and gender as covariates. I have uploaded my full design matrix for this model, so any feedback you could give me would be greatly appreciated.

Q2. In continuation of the matter above, in the same paradigm but in a different comparison, I'm still having the same issue of rank deficient warning when declaring the repeated measures. I'm trying to compute the group difference between two groups (e.g. SZ vs HC) using and F-test with the contrast [1,0,0,0,0...] with site 1-4, age and gender as covariates. I have uploaded the full design matrix for this model as well so if you could give it a look I would be very grateful. An additional question for this is: as I am using the sites 1-4 as covariates should I have columns for all 4 sites or can the information regarding site 1 be inferred from the other 3 sites even when used as covariates?

Q3. And a final theoretical question that I always have problems with: When making the design matrix sometimes 1,0  is used to differentiate between groups, conditions etc. and sometimes 1,-1 is used. What is the way to know, which pair of values to use, for example in the case of these two models, because it is not always obvious to me...

Thank you very much for your time and consideration.

Best Alexandra
Sep 9, 2021  03:09 AM | Andrew Zalesky
RE: Help with design matrix and contrast for 4x3 model
Hi Alexandra, 

1. You most likely need to remove gender as a covariate. Presumably the gender of each subject does not vary between the conditions (repeated measure)? Given that a separate column (mean) is modelled for each subject, there is no need to model additional covariates that remain constant between the conditions, such as sex. If age remains the same between conditions, it will also need to be removed. 

2. I suspect that this is the same issue as 1. 

3. Most of the times it does not matter if you use 0/1 or -1/1. The results will be the same for both cases, except in some cases where you are modelling interaction effects, where the beta value can differ between the two cases.  

best wishes,

Andrew

Originally posted by Alexandra Anagnostopoulou:
Dear Dr Zalesky,

Thank you so much for the swift reply!!

I have implemented what you have previously suggested but I am still having some issues so I was wondering if I could pick your brain regarding these matters

Q1. In relation to the previous matter, to answer your question: Yes all of the subjects have repeated measures. I have removed the global mean and the column representing one of the sites and done an F-test with the contrast [0,0,1,0,0,...], as you have suggested but I am still having the "rank deficient" warning. I played around with the model and found out that as soon as I declare the repeated measures that's when the warning is shown, but I have no idea what I'm doing wrong or how to fix it. As I previously said, I have three groups (HC,SZ,MDD) with 2 conditions (DL,DW) with subjects from 4 different sites (Site 1-4). In this model I'm trying to see the effect of the site using the grouping, age and gender as covariates. I have uploaded my full design matrix for this model, so any feedback you could give me would be greatly appreciated.

Q2. In continuation of the matter above, in the same paradigm but in a different comparison, I'm still having the same issue of rank deficient warning when declaring the repeated measures. I'm trying to compute the group difference between two groups (e.g. SZ vs HC) using and F-test with the contrast [1,0,0,0,0...] with site 1-4, age and gender as covariates. I have uploaded the full design matrix for this model as well so if you could give it a look I would be very grateful. An additional question for this is: as I am using the sites 1-4 as covariates should I have columns for all 4 sites or can the information regarding site 1 be inferred from the other 3 sites even when used as covariates?

Q3. And a final theoretical question that I always have problems with: When making the design matrix sometimes 1,0  is used to differentiate between groups, conditions etc. and sometimes 1,-1 is used. What is the way to know, which pair of values to use, for example in the case of these two models, because it is not always obvious to me...

Thank you very much for your time and consideration.

Best Alexandra