indi > RE: Distribution of Test Labels
Sep 27, 2011  05:09 PM | Arno Klein
RE: Distribution of Test Labels
I believe that it would be a shame to release the data and end the challenge, and completely agree with Satra that there are straightforward ways of foiling overfitting.  One consideration is to require user registration before site visitors could upload their results (rather than logging an IP address), and to post their results with a timestamp.  Another consideration is to request each visitor to classify and upload results for a random drawing of subjects.

cheers,
@rno

Originally posted by Satrajit Ghosh:
dear paolo and others:

--
I don't believe to setup a service for subsequent submissions would be a good idea. There is the risk to orient a lot of effort just to overfit the test set without achieving any additional insight both on ADHD or computational methods.
--

i think overfitting will be really hard with very simple tricks. for example:
- no submissions from the same ip within 24 hours or such (if you are really looking to improve and not overfit, you don't need to submit that often)
- allow only a subset of test data to be used for evaluation during a given period (a month) and change this subset month to month.
- one only gets back a confusion matrix (no specific error labels).
- one does have a training set that's fairly large (so from algorithmic insight/adhd, that should provide enough while the data continues to grow).

--
Last but not least I wouldn't neglect the cost to deliver and maintain an effective service of submission. The scenario could become even awful if we consider that the original data might be extended or fixed, both those for training and those for test.
--

this would be a decision for adhd200 folks to make. however, at its simplest it would be web service that simply responds to a post request with a file containing predicted labels and returns a json formatted confusion matrix, while checking for an ip or authentication (via openid).

--
I would recommend to collect a new test set for a second round of the competition since I consider your effort to arrange the ADHD dataset really valuable.
--

i think this is really hard but hopefully getting easier with efforts of the FCP/INDI folks. however, one can easily see that components of the existing test set becoming part of the training set whenever one adds new data to the test set.

i really applaud the organizers to have collected the amount of data they have and still have a relatively large test set. i personally would appreciate having the opportunity to test improvements in algorithms. i completely agree with you that each submission and iteration will slowly feed bias into the methods, but hopefully one can slow this process down sufficiently to allow new datasets to come online in the interim. it should not be held away for ever, but just enough to increase the robustness of algorithms while providing time for new data sets to be collected.

cheers,

satra

Threaded View

TitleAuthorDate
Paolo Avesani Sep 15, 2011
John Muschelli Sep 28, 2011
Paolo Avesani Sep 26, 2011
Satrajit Ghosh Sep 27, 2011
Maarten Mennes Sep 27, 2011
RE: Distribution of Test Labels
Arno Klein Sep 27, 2011
Huiguang He Sep 17, 2011
Maarten Mennes Sep 26, 2011