NITRC: CIFTI Connectivity File Format: open-discussion

open-discussion

7 Subscribers

open-discussion > CIFTI 2.0: a heads-up

Showing 1-4 of 4 posts

CIFTI 2.0: a heads-up

The CIFTI 1.0 data file format was introduced in 2011 and is used extensively in the datasets publicly released by the Human Connectome Project (HCP, http://www.humanconnectome.org/). In the course of these efforts, the HCP software development team has encountered several limitations with the CIFTI 1.0 file format. We are currently drafting a proposal for CIFTI 2.0 that will include a number of refinements and new features that will be useful for handling of fMRI, MEG and tractography data. The proposed changes are relatively modest in scope, but they are not backwards-compatible with CIFTI 1.0. Hence, it is preferable to increment the version number.

A draft document for CIFTI 2.0 will be circulated soon to this forum, to the original CIFTI working group, and to others who request to join by responding to this forum. By this message, we encourage suggestions from others regarding desired design features or constraints that should be considered for the new CIFTI 2.0 format.

Several recent publications in a special issue of Neuroimage illustrate the utility of the CIFTI format for HCP data (Glasser et al.; http://www.ncbi.nlm.nih.gov/pubmed/23668... Smith et al.; http://www.ncbi.nlm.nih.gov/pubmed/23702...); and Barch et al., http://www.ncbi.nlm.nih.gov/pubmed/23684...)

RE: CIFTI 2.0: a heads-up

Dear members of the CIFTI working group:

In the brain imaging community we have had a strong resistance to adopting HDF5 as the underlying format for our data. This is the case, even though this format is readily readable and writable by C++, Python, R, Matlab, Java and others, and has been extensively used by communities that generate a lot of data. Furthermore, this format allows extensive structured metadata to be stored in the header, serializable compression of datasets and parallel access to datasets even under compression. Such a format would allow web services to return queries on subsets of the data without having to ship the entire dataset.

The benefits of having native readers/writers in high level languages would make it significantly easier to play with the Connectome project data in CIFTI format, relative to currently being able to work primarily using the connectome workbench.

Now there are many other solutions to the technical problem of storing scientific data, but it seems HDF5 might be one that has been reasonably well tested in large data domains.

I would encourage the working group to consider the option, and if they already have and decided not to use it, at least to inform the community as to why HDF5 is not applicable to the problem of storing brain imaging data.

RE: CIFTI 2.0: a heads-up

Although I am relatively new to neuroimaging, I have past experience with large datasets and high performance computing and find myself in very strong agreement with the argument in favour of considering HDF5. Indeed, for our own neuroimaging connectivity work, my lab uses a custom (read: hacked together) HDF5-based format.

As one's datasets grow larger, so do the benefits of HDF5; of particular note is the fact that HDF5 works nicely in a parallel distributed-memory computing environment and has been fairly thoroughly tested in that domain. Parallel file I/O is a nontrivial endeavour and it makes life quite a bit easier to have a file format that brings with it a strong library for facilitating this.

RE: CIFTI 2.0: a heads-up

This posting responds to Drs. Ghosh and Daley, who appropriately asked why the CIFTI format is based on NIFTI-2 rather than HDF5 and whether this is a reversible decision.

The original decision made by the CIFTI working group in 2011 was based on several considerations and was made after considering HDF5 as an alternative.

NIFTI-1 is a widely accepted standard in the neuroimaging community and is supported by most brain imaging software platforms. However, NIFTI-1 was not suitable for the proposed CIFTI format primarily because it has a 16-bit limitation in the dimensions it supports (32,767 dimension length). In contrast, NIFTI-2 file header indices (adopted by the NIFTI committee in 2011) are 64-bit integers, allowing nearly unlimited dimension length. This modification made NIFTI-2 an attractive option for CIFTI because it met the core requirements, was substantially simpler to implement than HDF5, and offers an easier path for NIFTI-compliant and GIFTI-compliant brain imaging platforms to adopt.

The specific changes soon to be proposed for CIFTI 2.0 involve refinements and clarifications internal to the CIFTI XML metadata structure. The proposed changes would not be solved or simplified using an alternative format such as HDF5. NIFTI-2 remains an attractive format for CIFTI 2.0, as it provides the most straightforward path for ongoing CIFTI development.

We appreciate that HDF5 offers advantages when dealing with complex datasets that can capitalize on the rich hierarchical organization inherent in HDF5. If' the evolving data requirements for human structural and functional connectivity analyses would benefit strongly from HDF5, perhaps it might provide the substrate on which a future CIFTI 3.0 could be based. However, conversion to HDF5 would be a major undertaking, requiring group consensus on a wide range of format specifications needed to make implementation feasible. Hence, while it is important to remain genuinely open to this possibility in the future, HDF5 for CIFTI is not a route to be undertaken lightly.

David Van Essen, for the HCP software development team