indi
indi > RE: Distribution of Test Labels
Sep 27, 2011 05:09 PM | Maarten Mennes
RE: Distribution of Test Labels
Originally posted by Satrajit Ghosh:
i completely agree with you that each submission
and iteration will slowly feed bias into the methods, but hopefully
one can slow this process down sufficiently to allow new datasets
to come online in the interim. it should not be held away for ever,
but just enough to increase the robustness of algorithms while
providing time for new data sets to be collected.
cheers,
satra
cheers,
satra
I think this quote by Satra nicely sums it all. While there is an obvious benefit in gathering a new test-sample, it might not necessarily be feasible in the short run. What is more likely to happen is that other sites want to contribute their data. At that point, we could split their contribution into a training part and a withheld test part. As such, both the training and the test set would be growing on a continuous basis (and yes, at that point older test sets can be exposed), further recuding the possibility of overfitting.
Maarten
Threaded View
| Title | Author | Date |
|---|---|---|
| Paolo Avesani | Sep 15, 2011 | |
| John Muschelli | Sep 28, 2011 | |
| Paolo Avesani | Sep 26, 2011 | |
| Satrajit Ghosh | Sep 27, 2011 | |
| Maarten Mennes | Sep 27, 2011 | |
| Arno Klein | Sep 27, 2011 | |
| Huiguang He | Sep 17, 2011 | |
| Maarten Mennes | Sep 26, 2011 | |
