indi > RE: Distribution of Test Labels
Sep 27, 2011  05:09 PM | Maarten Mennes
RE: Distribution of Test Labels

Originally posted by Satrajit Ghosh:
i completely agree with you that each submission and iteration will slowly feed bias into the methods, but hopefully one can slow this process down sufficiently to allow new datasets to come online in the interim. it should not be held away for ever, but just enough to increase the robustness of algorithms while providing time for new data sets to be collected.

cheers,

satra

I think this quote by Satra nicely sums it all. While there is an obvious benefit in gathering a new test-sample, it might not necessarily be feasible in the short run. What is more likely to happen is that other sites want to contribute their data. At that point, we could split their contribution into a training part and a withheld test part. As such, both the training and the test set would be growing on a continuous basis (and yes, at that point older test sets can be exposed), further recuding the possibility of overfitting. 

Maarten

Threaded View

TitleAuthorDate
Paolo Avesani Sep 15, 2011
John Muschelli Sep 28, 2011
Paolo Avesani Sep 26, 2011
Satrajit Ghosh Sep 27, 2011
RE: Distribution of Test Labels
Maarten Mennes Sep 27, 2011
Arno Klein Sep 27, 2011
Huiguang He Sep 17, 2011
Maarten Mennes Sep 26, 2011