help > Changing data's location
Showing 1-5 of 5 posts
Feb 15, 2016 04:02 PM | Bob Kraft
Changing data's location
I am looking for some advice on how to effectively use our HPC
cluster. Our HPC cluster only has temporary storage for data
processing. It is intended to be used for batch processing and not
for interactive data processing or data visualization. Our data is
stored on a separate computer and network disk space. With
this setup I was planning to do the following
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
Feb 15, 2016 04:02 PM | Jalmar Teeuw
RE: Changing data's location
You could try a 'search & replace' on the CONN_x structure stored
in the project file. For
example: http://nl.mathworks.com/matlabcentral/fileexchange/27858-struct-string-replace
I didn't Google when we ran into the same issue, so instead of using an existing Matlab package I wrote a small Matlab function that performs the same (see files in attached zip file). If you perform a search & replace, be sure to add SPM and CONN to Matlab's PATH, otherwise you might get warnings about Matlab not understanding 'nifti' structures when loading the project file and it may not traverse these structures during the find & replace.
In case you want to the scripts attached, you would run
find_and_replace_matfile(source_filename, target_filename, needle, replace)
Where source_filename is the path to the CONN conn_*.mat project file you which to alter,
The target_filename is the path where you would like to save the modified version of the CONN project file (best to use a different target filename than the source so you can check if the search & replace works before you remove to original CONN project file)
The needle is what you want to search for, in this case the path where the CONN project file was store (usually it contains the conn_*.mat project file and folder); e.g. the string '/mnt/data/HPC-cluster/temp/username/projects/conn-subject-001'
The replace is what you want the needle to be replaced with, in this case the location where the project is currently stored; e.g. the string '/Users/username/Projects/conn-subject-001'
In addition to find & replace the root of the project folder, we also had to replace the path to the atlas ROI used in CONN. I.e. replace something like '/mnt/home/username/Matlab/CONN/' with '/Users/username/Matlab/CONN/'. You just repeat the find & replace until CONN is satisfied it found all the files.
Attached is a quick & dirty recursive search & replace function, but it works for our project files. However, it may not work on every project, so maybe better to use on of the existing (and more thoroughly tested) packages from Matlab Central. Nevertheless, the idea is the same.
Cheers,
Jalmar
I didn't Google when we ran into the same issue, so instead of using an existing Matlab package I wrote a small Matlab function that performs the same (see files in attached zip file). If you perform a search & replace, be sure to add SPM and CONN to Matlab's PATH, otherwise you might get warnings about Matlab not understanding 'nifti' structures when loading the project file and it may not traverse these structures during the find & replace.
In case you want to the scripts attached, you would run
find_and_replace_matfile(source_filename, target_filename, needle, replace)
Where source_filename is the path to the CONN conn_*.mat project file you which to alter,
The target_filename is the path where you would like to save the modified version of the CONN project file (best to use a different target filename than the source so you can check if the search & replace works before you remove to original CONN project file)
The needle is what you want to search for, in this case the path where the CONN project file was store (usually it contains the conn_*.mat project file and folder); e.g. the string '/mnt/data/HPC-cluster/temp/username/projects/conn-subject-001'
The replace is what you want the needle to be replaced with, in this case the location where the project is currently stored; e.g. the string '/Users/username/Projects/conn-subject-001'
In addition to find & replace the root of the project folder, we also had to replace the path to the atlas ROI used in CONN. I.e. replace something like '/mnt/home/username/Matlab/CONN/' with '/Users/username/Matlab/CONN/'. You just repeat the find & replace until CONN is satisfied it found all the files.
Attached is a quick & dirty recursive search & replace function, but it works for our project files. However, it may not work on every project, so maybe better to use on of the existing (and more thoroughly tested) packages from Matlab Central. Nevertheless, the idea is the same.
Cheers,
Jalmar
Feb 17, 2016 04:02 PM | Bob Kraft
RE: Changing data's location
Jalmar,
Thanks for the detailed reply. I will take a look at your script and give it a try.
Bob
Thanks for the detailed reply. I will take a look at your script and give it a try.
Bob
Feb 17, 2016 05:02 PM | Alfonso Nieto-Castanon - Boston University
RE: Changing data's location
Hi Bob & Jalmar,
In case this helps, and following Jalmar example, I am also attaching a patch that allows you to handle those folder name changes programmatically (this patch is for release 15h, simply copy the attached file to the conn distribution folder overwriting the file with the same name there). CONN already had some search/replace capability which was used, for example, to allow you to enter through the GUI just the location of the first subject/session missing datafile and then CONN would automatically generate a search/replace pattern from the entered info and attempt to apply that same transformation to the rest of the missing subjects/sessions datafiles. What I added in this patch is then just the ability to: 1) programmatically define those search/replace patterns as well; and 2) have CONN "remember" those search/replace patterns across different projects (e.g. when merging multiple projects).
For example, if your data in the HPC cluster was located in a /tmp/data folder and in your main system it is located in a /projects/myproject/data folder, then after rsyncing your project back to your main system, you could now type (or include in your scripts):
conn_updatefilepaths('init', '/tmp/data','/projects/myproject/data');
right before loading and/or merging your project files in CONN. This will tell CONN that at any future time (within the current Matlab session), when CONN is loading a project or merging several projects, if it finds some missing files in any /tmp/data[SOMETHINGELSE] location, it should first try to see if those files exist in the location /projects/myproject/data[SOMETHINGELSE], and if they can be found there then CONN will automatically fix those references without prompting you to locate those files (if they do not exist in the new location or if the missing files do not match the /tmp/data* pattern CONN will still ask you to locate those missing files as it normally does).
In general, the syntax is:
conn_updatefilepaths('init', root_searchstring, root_replacestring)
where root_searchstring and root_replacestring may be strings (for a single folder name change), or cell arrays of strings (for multiple folder name changes), to define programmatically potential search/replace patterns.
Just for reference, you may also use either the syntax:
conn_updatefilepaths('init',{},{});
or equivalently the syntax:
conn_updatefilepaths('hold','off');
to "forget" from this point on any potential search/replace patterns that you may have entered before. Or the syntax:
conn_updatefilepaths('hold','on');
to have CONN "remember" from this point on any potential search/replace patterns that you may implicitly define through the GUI (when prompted to select the location of a missing file) without actually suggesting any initial search/replace patterns programmatically to begin with. This is useful, for example, if you load a project and manually fix some folder reference when prompted by the GUI, and after that you load a second project and you do not want to have to enter exactly the same folder-name change that you already fixed in the previous project.
Hope this helps and let me know if you run into any issues and/or if you would like me to further clarify any of the above.
Best
Alfonso
Originally posted by Bob Kraft:
In case this helps, and following Jalmar example, I am also attaching a patch that allows you to handle those folder name changes programmatically (this patch is for release 15h, simply copy the attached file to the conn distribution folder overwriting the file with the same name there). CONN already had some search/replace capability which was used, for example, to allow you to enter through the GUI just the location of the first subject/session missing datafile and then CONN would automatically generate a search/replace pattern from the entered info and attempt to apply that same transformation to the rest of the missing subjects/sessions datafiles. What I added in this patch is then just the ability to: 1) programmatically define those search/replace patterns as well; and 2) have CONN "remember" those search/replace patterns across different projects (e.g. when merging multiple projects).
For example, if your data in the HPC cluster was located in a /tmp/data folder and in your main system it is located in a /projects/myproject/data folder, then after rsyncing your project back to your main system, you could now type (or include in your scripts):
conn_updatefilepaths('init', '/tmp/data','/projects/myproject/data');
right before loading and/or merging your project files in CONN. This will tell CONN that at any future time (within the current Matlab session), when CONN is loading a project or merging several projects, if it finds some missing files in any /tmp/data[SOMETHINGELSE] location, it should first try to see if those files exist in the location /projects/myproject/data[SOMETHINGELSE], and if they can be found there then CONN will automatically fix those references without prompting you to locate those files (if they do not exist in the new location or if the missing files do not match the /tmp/data* pattern CONN will still ask you to locate those missing files as it normally does).
In general, the syntax is:
conn_updatefilepaths('init', root_searchstring, root_replacestring)
where root_searchstring and root_replacestring may be strings (for a single folder name change), or cell arrays of strings (for multiple folder name changes), to define programmatically potential search/replace patterns.
Just for reference, you may also use either the syntax:
conn_updatefilepaths('init',{},{});
or equivalently the syntax:
conn_updatefilepaths('hold','off');
to "forget" from this point on any potential search/replace patterns that you may have entered before. Or the syntax:
conn_updatefilepaths('hold','on');
to have CONN "remember" from this point on any potential search/replace patterns that you may implicitly define through the GUI (when prompted to select the location of a missing file) without actually suggesting any initial search/replace patterns programmatically to begin with. This is useful, for example, if you load a project and manually fix some folder reference when prompted by the GUI, and after that you load a second project and you do not want to have to enter exactly the same folder-name change that you already fixed in the previous project.
Hope this helps and let me know if you run into any issues and/or if you would like me to further clarify any of the above.
Best
Alfonso
Originally posted by Bob Kraft:
I am looking for some advice on how to
effectively use our HPC cluster. Our HPC cluster only has temporary
storage for data processing. It is intended to be used for batch
processing and not for interactive data processing or data
visualization. Our data is stored on a separate computer and
network disk space. With this setup I was planning to do the
following
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
Mar 21, 2019 01:03 AM | Benson Stevens
RE: Changing data's location
I was wanting to know this as well. I couldn't get the conn scripts
to work right. I found a different way by experimenting and is
simple as heck. The reason I needed to change path names, is I
updated my mac operating system. Never do that unless you really
need to. My external hard drive is in NTFS, and whatever came with
the hard drive didn't work on the new OS. After finding a free
program, it would change the name of my hard drive to the programs
own version of it. iBoysoft.
Anyway, if you use conn, you also have spm. There is a function in spm called spm_changepath.m which I have used before for spm. In this example, I was just changing every file path that had my "hard drives" name with what this new plugin to write in NTFS was creating. This assumes you changed directory to your conn base folder:
spm_changepath('myconnproject.mat', 'Benson', 'iboysoft_ntfs_disk2s2_')
Being kind of like /Volumes/Benson/Imaging_data/....... to /Volumes/iboysoft_ntfs_disk2s2_/Imaging_data/.......
It might yell at you the first time saying it can't find some .....dmat file when you first open it. I think conn is assuming you want to merge files. Just close the error. It will work from there. I just wanted to change the ROI's used at the first level step, and worked perfect all the way through getting 2nd level results.
Many Blessings,
Benson
Anyway, if you use conn, you also have spm. There is a function in spm called spm_changepath.m which I have used before for spm. In this example, I was just changing every file path that had my "hard drives" name with what this new plugin to write in NTFS was creating. This assumes you changed directory to your conn base folder:
spm_changepath('myconnproject.mat', 'Benson', 'iboysoft_ntfs_disk2s2_')
Being kind of like /Volumes/Benson/Imaging_data/....... to /Volumes/iboysoft_ntfs_disk2s2_/Imaging_data/.......
It might yell at you the first time saying it can't find some .....dmat file when you first open it. I think conn is assuming you want to merge files. Just close the error. It will work from there. I just wanted to change the ROI's used at the first level step, and worked perfect all the way through getting 2nd level results.
Many Blessings,
Benson