open-discussion > Statistical Flaw in Estimates of Reproducibility
Jun 24, 2020  03:06 PM | Shady El Damaty - Georgetown University
Statistical Flaw in Estimates of Reproducibility
Hi,

The mICA toolbox generates estimates of reproducibility by splitting a dataset into two unique groups N times and calculates D ICA decompositions, both parameters specified by the user.

The Pearson correlation coefficient produced by the program `fslcc` is computed for each pair (1 to N) of D decompositions. At each repetition, the similarity matrix shows how similar component 1 in group 1 is to component 1 in group 2 (after sorting components with munkres). The final reproducibility estimates are generated by averaging the diagonal of each of these matrices N times for each of the D components.

The sampling distribution of Pearson's correlation coefficient exhibits multiple estimation biases related to the asymptotic behavior of the Pearson statistic as well as due to sample size effects, as demonstrated here with unequal diagonal sizes across increasing model order, resulting in under/over estimates of the true population parameters. You can address these estimation biases multiple ways.

The simplest is to compute the Fisher transformation prior to averaging correlation coefficients. Alternatively there are formulas that can be used to generate revised estimates i.e.) Shieh, G. (2010). Estimation of the simple correlation coefficient. Behavior Research Methods, 42(4), 906-917.

In some cases the estimation error is minor, however in the case where local maxima of reproducibility are incredibly sensitive to component splitting (because of noise or highly variable sources) this can cause significant inference bias in selecting the appropriate model order as representative of the studied region.

In my case, I found significant differences and have dropped using mICA for post-hoc reproducibility analysis. The job scheduling is still quite useful, however the final reproducibility plots are incorrect for my purposes. I've written my own code to get around these issues. Attached is a plot with the revised estimates as opposed to the original.

Threaded View

TitleAuthorDate
Statistical Flaw in Estimates of Reproducibility
Shady El Damaty Jun 24, 2020
Shady El Damaty Jun 24, 2020
Shady El Damaty Jun 24, 2020