help > WARNING: possibly incorrect model
Showing 1-7 of 7 posts
Display:
Results per page:
Mar 16, 2021  06:03 PM | Mikey Malina - University of Chicago
WARNING: possibly incorrect model
Hello,

I am conducting second-level analyses on a large set of data (1084 subjects, many many second level-covariates) and am frequently getting the below error when running contrasts:

WARNING: possibly incorrect model: non-estimable contrasts (suggestion: simplify second-level model)

I'm not too sure what the problem is, I am unable to simplify the model without getting rid of valuable information, and can't seem to find anything explicitly wrong with the way I have it set up. Of the ~30 contrasts I am running in this analysis, about 15 are showing this warning. I have attached a document showing the warning window in CONN.

Thank you,
Mikey
Mar 21, 2021  09:03 PM | Alfonso Nieto-Castanon - Boston University
RE: WARNING: possibly incorrect model
Hi Mikey,

The "non-estimable contrast" warning means that your contrast cannot be uniquely estimated from the data (typically because either the between-subjects contrast itself is incorrectly defined, or because your predictors include some redundancies which your contrast does not acknowledge; e.g. if I create an analysis with predictors 'AllSubjects', 'Patients', and 'Controls' , then the contrast [0 1 -1] is perfectly estimable but the contrast [1 0 0] cannot be estimated). Could you please let me know the details of your second-level model (in particular the choice of 11 predictors entered into your GLM, and the between-subjects contrast used, for at least one of these analyses)?

Thanks
Alfonso
Originally posted by Mikey Malina:
Hello,

I am conducting second-level analyses on a large set of data (1084 subjects, many many second level-covariates) and am frequently getting the below error when running contrasts:

WARNING: possibly incorrect model: non-estimable contrasts (suggestion: simplify second-level model)

I'm not too sure what the problem is, I am unable to simplify the model without getting rid of valuable information, and can't seem to find anything explicitly wrong with the way I have it set up. Of the ~30 contrasts I am running in this analysis, about 15 are showing this warning. I have attached a document showing the warning window in CONN.

Thank you,
Mikey
Mar 23, 2021  10:03 PM | Mikey Malina - University of Chicago
RE: WARNING: possibly incorrect model
Hey Alfonso,

Thank you for your response, it makes a lot of sense. I have attached a file with 5 different examples of contrasts with warning and their respective parameters
Attachment: WarningInfo.docx
Apr 2, 2021  08:04 PM | Mikey Malina - University of Chicago
RE: WARNING: possibly incorrect model
Hey Alfonso,

I am following up on the previous comment. I have reattached the Warning Info doc, now with more variable information. The contrasts creating warning problems (12 in total, 5 included in the document) are the following. The addition of the 'Med' covariate adds a covariate quantifying anti-hallucinatory medication use.

Current Hallucation (and Med)
Past Hallucination (and Med)
Never Hallucination (and Med)
HC & PTS (and Med) HC -> healthy control, PTS -> patients
PTS (and Med)
HC 
All, -> all subjects.

Thank you for your time,
Mikey
Attachment: WarningInfo.docx
Apr 9, 2021  10:04 AM | Alfonso Nieto-Castanon - Boston University
RE: WARNING: possibly incorrect model
Hi Mikey,

Yes, those between-subjects contrasts are likely all non-estimable due to a range of issues, from some of your models having multiple redundancies (e.g. in the PTS model, the Proband and OnlyProband regressors are likely identical within the PTS-group subjects) to the control covariates not having been centered (e.g. what is the zero-level of the "site" covariate "boston/chicago/dallas/etc"?)

I would recommend to:

1) first center (subtract the average across all subjects) any control covariates, e.g. GoodQA2, mFDpower, Age Sex, boston, chicago, dallas, georgia (alternatively you could also include the average value of these covariates as part of the contrast vector, but I find this centering approach generally simpler)

2) define a model across all subjects (subgroup-specific analyses can be misleading because then the control covariates would be controlled 'separately' within each group, which may or may not be what you are intending to do), e.g.

   GoodQA2;mFDpower;Age;Sex;boston;chicago;dallas;georgia;HealthyControl;Proband

3) use the between-subjects contrast to specify which subgroup you want to evaluate, or which groups you want to compare, e.g.

   [0 0 0 0 0 0 0 0 1 0] to look at the average connectivity within HC
   [0 0 0 0 0 0 0 0 0 1] to look at the average connectivity within PTS
   [0 0 0 0 0 0 0 0 -1 1] to look at the difference between PTS and HC groups

Hope this helps
Alfonso


Originally posted by Mikey Malina:
Hey Alfonso,

I am following up on the previous comment. I have reattached the Warning Info doc, now with more variable information. The contrasts creating warning problems (12 in total, 5 included in the document) are the following. The addition of the 'Med' covariate adds a covariate quantifying anti-hallucinatory medication use.

Current Hallucation (and Med)
Past Hallucination (and Med)
Never Hallucination (and Med)
HC & PTS (and Med) HC -> healthy control, PTS -> patients
PTS (and Med)
HC 
All, -> all subjects.

Thank you for your time,
Mikey
Apr 9, 2021  06:04 PM | Mikey Malina - University of Chicago
RE: WARNING: possibly incorrect model
Thanks for your response Alfonso. Some follow up questions/clarfications:

1. We define 5 sites in our project, but only the 4 sites listed in the example contrasts are used for analysis to not have dummy coded redundancy. This 5th site would be the zero right - or are you referring to something else? It was my impression that centering covariates was optional. That is helps interpret the intercept...I'm not clear what that means for our project especially with all the site covariates. Does not centering effect the actual significant cluster which can appear for a test?

It was my impression that centering covariates was optional. That is helps interpret the intercept...I'm not clear what that means for our project especially with all the site covariates.
The questions we are most often interested in are how does connectivity change in relation to a pt's symptom covariate e.g.(hallucinations), interactions between pt subgroups for a symptom, and how does connectivity differ among pt subgroups and hc.

We were attempting to define the one sample subgroup analyses to have a better baseline understanding of connectivity in the ROIs to aid in interpretations.

2. Sorry, I'm unfamiliar of the danger of controlling variables "separately" within groups. Can you clarify or provide any references to this issue? So, for say for a single manuscript you're generally suggesting that all the subject groups referenced should be included in each analysis using models defined across all subjects?
Since our project sample is fairly large we've had concerns about including all the subjects (minus the BadQA subjects) in every analysis.

3. So is what happening here still equivalent to a one sample t-test or is it a bit different?
I must admit I've been nervous of the variance from the other subjects somehow influencing the different subgroup analyses.... but that may just be major confusion on part to what's happening? But like I go into below also confused because now the sample sizes for the subgroup analyses is much bigger than a standard one sample t-test.

But also to clarify this strategy you listed how would we know what subgroups to put in the model and when. What about when we want to do other one sample tests of SZC, BPD, SAD, Bio1-3, and other subgroups like NeverHallucinators. Some of these subgroups are unique amongst one another and some are overlapping.

Or is just having our GoodQA2 inclusion variable enough to cover the subject space and then we just add the subgroup for each within sample connectivity test?

4. In regards to the use of Proband and OnlyProband, I am understanding the redundancy issue but the reason we've been using it has been due to the concern over subject number inflation. Which is related to above but I expand with our examples in the attached file (and with information cross-listed in the following post: https://www.nitrc.org/forum/forum.php?thread_id=12308&forum_id=1144). Quick gist, when removing the onlySZP covariate from contrast (keeping SZP), the warning goes away, but our subject n shoots up from ~204 to 769.

Thank you,
Mikey
Apr 9, 2021  10:04 PM | Alfonso Nieto-Castanon - Boston University
RE: WARNING: possibly incorrect model
Hi Mikey,
Some thoughts on your questions below
Best
Alfonso

1. We define 5 sites in our project, but only the 4 sites listed in the example contrasts are used for analysis to not have dummy coded redundancy. This 5th site would be the zero right - or are you referring to something else? It was my impression that centering covariates was optional. That is helps interpret the intercept...I'm not clear what that means for our project especially with all the site covariates. Does not centering effect the actual significant cluster which can appear for a test?

The questions we are most often interested in are how does connectivity change in relation to a pt's symptom covariate e.g.(hallucinations), interactions between pt subgroups for a symptom, and how does connectivity differ among pt subgroups and hc.

We were attempting to define the one sample subgroup analyses to have a better baseline understanding of connectivity in the ROIs to aid in interpretations.

Centering (or not) your control covariates affects the intercept terms in your model, so it will NOT affect the statistics of neither between-group differences (e.g. GroupA vs. GroupB connectivity differences, since both groups are affected by the centering equally) nor correlation analyses (e.g. associations with symptom covariate). What centering/notcentering changes, though, are your estimated "simple main effects" (the 'corrected' average connectivity within a group), since those 'corrected' values are estimated "at the zero-level" of your covariates. For example, if you have an ANCOVA like:

   Model: "GroupA", "GroupB", and "age"
   Contrast: [1 -1 0]

the results of that test will be exactly the same whether you center the "age" covariate or not. But, when you try to estimate the simple main effects of this same model (to help interpret the results), for example asking what is the 'corrected' average connectivity within each group, using a contrast like:

    Contrast: [1 0 0] (to look at GroupA)
                   [0 1 0] (to look at GroupB)

the results of these analyses will be totally different depending on whether you centered or not the age covariate, because the 'corrected' connectivity will be estimated for subjects that have age = 0. If you have entered in "age" raw values (e.g. age in years) then you are asking the question of what is the estimated connectivity of newborns, and if you have entered in "age" centered values (e.g. age in years minus average age in your sample) then you are asking the question of what is the estimated connectivity of a person with age = average-age-of-your-sample. 
2. Sorry, I'm unfamiliar of the danger of controlling variables "separately" within groups. Can you clarify or provide any references to this issue? So, for say for a single manuscript you're generally suggesting that all the subject groups referenced should be included in each analysis using models defined across all subjects?
Since our project sample is fairly large we've had concerns about including all the subjects (minus the BadQA subjects) in every analysis.

Sorry I was not too clear, there is no danger per-se on controlling separately within each group, it is only that the effect of the control covariates is totally different (e.g. it is different to remove any differences between scanner sites "across all subjects" in an analysis that compares patients to controls, to removing differences between scanner site "only in patients" in the same analyses, or removing differences between scanner site "separately in patients and controls", the two latter examples representing a group-by-site interaction instead of a main site effect as in the former example). 

3. So is what happening here still equivalent to a one sample t-test or is it a bit different?
I must admit I've been nervous of the variance from the other subjects somehow influencing the different subgroup analyses.... but that may just be major confusion on part to what's happening? But like I go into below also confused because now the sample sizes for the subgroup analyses is much bigger than a standard one sample t-test.

But also to clarify this strategy you listed how would we know what subgroups to put in the model and when. What about when we want to do other one sample tests of SZC, BPD, SAD, Bio1-3, and other subgroups like NeverHallucinators. Some of these subgroups are unique amongst one another and some are overlapping.

Or is just having our GoodQA2 inclusion variable enough to cover the subject space and then we just add the subgroup for each within sample connectivity test?

A simple rule of thumb would be to simply make all post-hoc analyses (analyses derived from an original analysis and meant just to help interpret the results of this original analysis) use the same model as the original analysis. 


4. In regards to the use of Proband and OnlyProband, I am understanding the redundancy issue but the reason we've been using it has been due to the concern over subject number inflation. Which is related to above but I expand with our examples in the attached file (and with information cross-listed in the following post: https://www.nitrc.org/forum/forum.php?thread_id=12308&forum_id=1144). Quick gist, when removing the onlySZP covariate from contrast (keeping SZP), the warning goes away, but our subject n shoots up from ~204 to 769.

Right, that's perfectly fine, simply keeping "OnlyProband" and remove "Proband" from there will create a model with the correct/reduced number of subjects while avoiding the redundancy (alternatively you can also keep the two covariates and simply enter a contrast [0 ... 0 1 1] and that will be exactly equivalent to the above solution, it is just a little bit more complicated to explain why that is the case; for some general introduction to GLM models and contrast you may want to see the slides in https://web.conn-toolbox.org/tutorials)

Hope this helps
Alfonso