help >

**FDR: Why the nonparametric p is used?**Showing 1-8 of 8 posts

Jul 29, 2017 02:07 AM | Takafumi Yano -

*Kyoto University*FDR: Why the nonparametric p is used?

Hi doctor Andrew.

I am a beginner user of NBS tool box. Then, I used FDR in this tool box.

And I read the NBSfdr.m file to understand what is done.

I have two questions. kindly please answer these questions.

Q1

I think the procedure of calculating nonparametric p value is done by permutated t statistics.

As far as I know, in most case, the non-parametric p value is directly calculated by using the permutated mean values. But in this program, the permutated t statistics value.

Please tell me why you use this procedure from theoretical aspects.

Q2

The BH method of adjusting FDR is famous and widely used.

However, in this program, Simes procedure is used.

Please tell me what is Simes procedure. and what is the merit of this Simes procedure?

Best regards,

Takafumi Yano

Kyoto University, Japan.

I am a beginner user of NBS tool box. Then, I used FDR in this tool box.

And I read the NBSfdr.m file to understand what is done.

I have two questions. kindly please answer these questions.

Q1

I think the procedure of calculating nonparametric p value is done by permutated t statistics.

As far as I know, in most case, the non-parametric p value is directly calculated by using the permutated mean values. But in this program, the permutated t statistics value.

Please tell me why you use this procedure from theoretical aspects.

Q2

The BH method of adjusting FDR is famous and widely used.

However, in this program, Simes procedure is used.

Please tell me what is Simes procedure. and what is the merit of this Simes procedure?

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 29, 2017 08:07 PM | Andrew Zalesky

RE: FDR: Why the nonparametric p is used?

Hi Takafumi,

when using FDR, it is crucial to set the number of permutations substantially higher (e.g. 10^5 or perhaps even 10^6). This is because permutation is used to compute a p-value for each edge individually when using FDR.

Q1. I'm not sure what you mean by permuted mean values here. The t-statistic is used as a measure of variation. The t-statistic is not permuted. We permute samples/individuals and then recompute the t-statistic for each permutation to provide a measure of variation and build an empirical null distribution.

Q2. I think Simes and BH are effectively the same. Simes proposed the procedure in 1986 and BH proved that the procedure controls the FDR and the FWER in the weak sense in 1995. In 2002, Storey proposed a variation that is useful when the proportion of true null hypotheses is small.

Andrew

when using FDR, it is crucial to set the number of permutations substantially higher (e.g. 10^5 or perhaps even 10^6). This is because permutation is used to compute a p-value for each edge individually when using FDR.

Q1. I'm not sure what you mean by permuted mean values here. The t-statistic is used as a measure of variation. The t-statistic is not permuted. We permute samples/individuals and then recompute the t-statistic for each permutation to provide a measure of variation and build an empirical null distribution.

Q2. I think Simes and BH are effectively the same. Simes proposed the procedure in 1986 and BH proved that the procedure controls the FDR and the FWER in the weak sense in 1995. In 2002, Storey proposed a variation that is useful when the proportion of true null hypotheses is small.

Andrew

*Originally posted by Takafumi Yano:*Hi doctor Andrew.

I am a beginner user of NBS tool box. Then, I used FDR in this tool box.

And I read the NBSfdr.m file to understand what is done.

I have two questions. kindly please answer these questions.

Q1

I think the procedure of calculating nonparametric p value is done by permutated t statistics.

As far as I know, in most case, the non-parametric p value is directly calculated by using the permutated mean values. But in this program, the permutated t statistics value.

Please tell me why you use this procedure from theoretical aspects.

Q2

The BH method of adjusting FDR is famous and widely used.

However, in this program, Simes procedure is used.

Please tell me what is Simes procedure. and what is the merit of this Simes procedure?

Best regards,

Takafumi Yano

Kyoto University, Japan.

I am a beginner user of NBS tool box. Then, I used FDR in this tool box.

And I read the NBSfdr.m file to understand what is done.

I have two questions. kindly please answer these questions.

Q1

I think the procedure of calculating nonparametric p value is done by permutated t statistics.

As far as I know, in most case, the non-parametric p value is directly calculated by using the permutated mean values. But in this program, the permutated t statistics value.

Please tell me why you use this procedure from theoretical aspects.

Q2

The BH method of adjusting FDR is famous and widely used.

However, in this program, Simes procedure is used.

Please tell me what is Simes procedure. and what is the merit of this Simes procedure?

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 02:07 AM | Takafumi Yano -

*Kyoto University*RE: FDR: Why the nonparametric p is used?

Hi Andrew,

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 02:07 AM | Takafumi Yano -

*Kyoto University*RE: FDR: Why the nonparametric p is used?

Hi Andrew,

Sorry, this is last question, so sorry.

Q3

I want to know why you use the non-parametric p values.

I think that in the first step, you calculate the t values and you can calculate the p values by using t distribution. So, simply I think you can use these p values and Sime's procedure.

Please tell me why you didn't use this parametric p values.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Sorry, this is last question, so sorry.

Q3

I want to know why you use the non-parametric p values.

I think that in the first step, you calculate the t values and you can calculate the p values by using t distribution. So, simply I think you can use these p values and Sime's procedure.

Please tell me why you didn't use this parametric p values.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 05:07 AM | Andrew Zalesky

RE: FDR: Why the nonparametric p is used?

*Originally posted by Takafumi Yano:*

Hi Andrew,

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 05:07 AM | Andrew Zalesky

RE: FDR: Why the nonparametric p is used?

Hi Takafumi,

the procedure you describe is ok in most cases. However, your difference measure is not a pivotal statistic. Statisticians prefer to use a pivotal statistic for permutation.

To understand why your measure is not pivotal, consider what would happen if you measure the difference in heights between midgets (very short people). The difference in mean heights would most likely be smaller than if we were to consider a population of individuals of normal height. A pivotal statistic/quantity (e.g. t-stat z-stat, etc) will not differ based on the population (e.g. midgets versus normal height).

In summary, the procedure you propose is ok, but most statisticians would prefer to use a pivotal statistic for permutation because this can alleviate issues for more complex designs involving multiple comparisons.

Andrew

the procedure you describe is ok in most cases. However, your difference measure is not a pivotal statistic. Statisticians prefer to use a pivotal statistic for permutation.

To understand why your measure is not pivotal, consider what would happen if you measure the difference in heights between midgets (very short people). The difference in mean heights would most likely be smaller than if we were to consider a population of individuals of normal height. A pivotal statistic/quantity (e.g. t-stat z-stat, etc) will not differ based on the population (e.g. midgets versus normal height).

In summary, the procedure you propose is ok, but most statisticians would prefer to use a pivotal statistic for permutation because this can alleviate issues for more complex designs involving multiple comparisons.

Andrew

*Originally posted by Takafumi Yano:*Hi Andrew,

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Thank you for your reply and kindness.

Q1.

Sorry for lacking nonparametric statistics, I can't tell what I think exactly.

I think that in ordinary non-parametric statistics, the null distribution is derived from the direct variables. In the simple case that we test the mean difference of the height of student. For example, there are two groups class A and B, and we get all heights from class A and B. To test the mean difference of height in class A and B. First, we calculate the mean of the heights Ha and Hb. Then, we subtract Ha and Hb and we obtain the mean difference D1. To test this D1 is significantly different, we permute n times the heights in A and B, and then we obtain D2 to Dn. Finally, we calculate the null distribution p by using these D1 to Dn. If the p is smaller than significance level. We think it is Significant.

However, this program uses the t value instead of the direct variables such that the difference D1 to Dn. In the case of connectivity analysis, I think it is natural to use the weights of edges instead of t values.

I want to know why it is ok to use t values to calculate the null distribution and merit of using t values instead of direct variables.

Q2.

Thank you so much for telling me this information, and I will read their article.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 05:07 AM | Andrew Zalesky

RE: FDR: Why the nonparametric p is used?

Hi Takafumi,

Non-parametric (permutation testing in this case) avoids making assumptions about normality.

On the other hand, permutation testing makes other assumption about exchangeability and variance.

There is nothing wrong with computing a p-value parametrically as long as the parametric null distribution is justified.

Andrew

Non-parametric (permutation testing in this case) avoids making assumptions about normality.

On the other hand, permutation testing makes other assumption about exchangeability and variance.

There is nothing wrong with computing a p-value parametrically as long as the parametric null distribution is justified.

Andrew

*Originally posted by Takafumi Yano:*Hi Andrew,

Sorry, this is last question, so sorry.

Q3

I want to know why you use the non-parametric p values.

I think that in the first step, you calculate the t values and you can calculate the p values by using t distribution. So, simply I think you can use these p values and Sime's procedure.

Please tell me why you didn't use this parametric p values.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Sorry, this is last question, so sorry.

Q3

I want to know why you use the non-parametric p values.

I think that in the first step, you calculate the t values and you can calculate the p values by using t distribution. So, simply I think you can use these p values and Sime's procedure.

Please tell me why you didn't use this parametric p values.

Best regards,

Takafumi Yano

Kyoto University, Japan.

Jul 31, 2017 04:07 PM | Takafumi Yano -

*Kyoto University*RE: FDR: Why the nonparametric p is used?

Hi Andrew

Thank you for your detail explanation.

It's all clear!

Thank you so much!

Takafumi Yano

Thank you for your detail explanation.

It's all clear!

Thank you so much!

Takafumi Yano