indi
indi > RE: Distribution of Test Labels
Sep 27, 2011 05:09 PM | Arno Klein
RE: Distribution of Test Labels
I believe that it would be a shame to release the data and end the
challenge, and completely agree with Satra that there are
straightforward ways of foiling overfitting. One
consideration is to require user registration before site visitors
could upload their results (rather than logging an IP address), and
to post their results with a timestamp. Another consideration
is to request each visitor to classify and upload results for a
random drawing of subjects.
cheers,
@rno
Originally posted by Satrajit Ghosh:
cheers,
@rno
Originally posted by Satrajit Ghosh:
dear paolo and others:
--
i think overfitting will be really hard with very simple tricks. for example:
- no submissions from the same ip within 24 hours or such (if you are really looking to improve and not overfit, you don't need to submit that often)
- allow only a subset of test data to be used for evaluation during a given period (a month) and change this subset month to month.
- one only gets back a confusion matrix (no specific error labels).
- one does have a training set that's fairly large (so from algorithmic insight/adhd, that should provide enough while the data continues to grow).
--
this would be a decision for adhd200 folks to make. however, at its simplest it would be web service that simply responds to a post request with a file containing predicted labels and returns a json formatted confusion matrix, while checking for an ip or authentication (via openid).
--
i think this is really hard but hopefully getting easier with efforts of the FCP/INDI folks. however, one can easily see that components of the existing test set becoming part of the training set whenever one adds new data to the test set.
i really applaud the organizers to have collected the amount of data they have and still have a relatively large test set. i personally would appreciate having the opportunity to test improvements in algorithms. i completely agree with you that each submission and iteration will slowly feed bias into the methods, but hopefully one can slow this process down sufficiently to allow new datasets to come online in the interim. it should not be held away for ever, but just enough to increase the robustness of algorithms while providing time for new data sets to be collected.
cheers,
satra
--
I don't believe to setup a service for
subsequent submissions would be a good idea. There is the risk to
orient a lot of effort just to overfit the test set without
achieving any additional insight both on ADHD or computational
methods.
--i think overfitting will be really hard with very simple tricks. for example:
- no submissions from the same ip within 24 hours or such (if you are really looking to improve and not overfit, you don't need to submit that often)
- allow only a subset of test data to be used for evaluation during a given period (a month) and change this subset month to month.
- one only gets back a confusion matrix (no specific error labels).
- one does have a training set that's fairly large (so from algorithmic insight/adhd, that should provide enough while the data continues to grow).
--
Last but not least I wouldn't neglect the cost
to deliver and maintain an effective service of submission. The
scenario could become even awful if we consider that the original
data might be extended or fixed, both those for training and those
for test.
--this would be a decision for adhd200 folks to make. however, at its simplest it would be web service that simply responds to a post request with a file containing predicted labels and returns a json formatted confusion matrix, while checking for an ip or authentication (via openid).
--
I would recommend to collect a new test set for
a second round of the competition since I consider your effort to
arrange the ADHD dataset really valuable.
--i think this is really hard but hopefully getting easier with efforts of the FCP/INDI folks. however, one can easily see that components of the existing test set becoming part of the training set whenever one adds new data to the test set.
i really applaud the organizers to have collected the amount of data they have and still have a relatively large test set. i personally would appreciate having the opportunity to test improvements in algorithms. i completely agree with you that each submission and iteration will slowly feed bias into the methods, but hopefully one can slow this process down sufficiently to allow new datasets to come online in the interim. it should not be held away for ever, but just enough to increase the robustness of algorithms while providing time for new data sets to be collected.
cheers,
satra
Threaded View
| Title | Author | Date |
|---|---|---|
| Paolo Avesani | Sep 15, 2011 | |
| John Muschelli | Sep 28, 2011 | |
| Paolo Avesani | Sep 26, 2011 | |
| Satrajit Ghosh | Sep 27, 2011 | |
| Maarten Mennes | Sep 27, 2011 | |
| Arno Klein | Sep 27, 2011 | |
| Huiguang He | Sep 17, 2011 | |
| Maarten Mennes | Sep 26, 2011 | |
