WhiteText - annotated neuroscience text

WhiteText is a corpus of manually annotated brain region mentions. It was created to facilitate text mining of neuroscience literature. The corpus contains 1,377 abstracts with 17,585 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words.

The corpus can be found at http://www.chibi.ubc.ca/WhiteText/

Previous evaluation of automated recognition of the mentions is described in:
"Automated Recognition of Brain Region Mentions in Neuroscience Literature"
by French, Lane, Xu and Pavlidis.
