help
help > RE: Changing data's location
Feb 17, 2016 05:02 PM | Alfonso Nieto-Castanon - Boston University
RE: Changing data's location
Hi Bob & Jalmar,
In case this helps, and following Jalmar example, I am also attaching a patch that allows you to handle those folder name changes programmatically (this patch is for release 15h, simply copy the attached file to the conn distribution folder overwriting the file with the same name there). CONN already had some search/replace capability which was used, for example, to allow you to enter through the GUI just the location of the first subject/session missing datafile and then CONN would automatically generate a search/replace pattern from the entered info and attempt to apply that same transformation to the rest of the missing subjects/sessions datafiles. What I added in this patch is then just the ability to: 1) programmatically define those search/replace patterns as well; and 2) have CONN "remember" those search/replace patterns across different projects (e.g. when merging multiple projects).
For example, if your data in the HPC cluster was located in a /tmp/data folder and in your main system it is located in a /projects/myproject/data folder, then after rsyncing your project back to your main system, you could now type (or include in your scripts):
conn_updatefilepaths('init', '/tmp/data','/projects/myproject/data');
right before loading and/or merging your project files in CONN. This will tell CONN that at any future time (within the current Matlab session), when CONN is loading a project or merging several projects, if it finds some missing files in any /tmp/data[SOMETHINGELSE] location, it should first try to see if those files exist in the location /projects/myproject/data[SOMETHINGELSE], and if they can be found there then CONN will automatically fix those references without prompting you to locate those files (if they do not exist in the new location or if the missing files do not match the /tmp/data* pattern CONN will still ask you to locate those missing files as it normally does).
In general, the syntax is:
conn_updatefilepaths('init', root_searchstring, root_replacestring)
where root_searchstring and root_replacestring may be strings (for a single folder name change), or cell arrays of strings (for multiple folder name changes), to define programmatically potential search/replace patterns.
Just for reference, you may also use either the syntax:
conn_updatefilepaths('init',{},{});
or equivalently the syntax:
conn_updatefilepaths('hold','off');
to "forget" from this point on any potential search/replace patterns that you may have entered before. Or the syntax:
conn_updatefilepaths('hold','on');
to have CONN "remember" from this point on any potential search/replace patterns that you may implicitly define through the GUI (when prompted to select the location of a missing file) without actually suggesting any initial search/replace patterns programmatically to begin with. This is useful, for example, if you load a project and manually fix some folder reference when prompted by the GUI, and after that you load a second project and you do not want to have to enter exactly the same folder-name change that you already fixed in the previous project.
Hope this helps and let me know if you run into any issues and/or if you would like me to further clarify any of the above.
Best
Alfonso
Originally posted by Bob Kraft:
In case this helps, and following Jalmar example, I am also attaching a patch that allows you to handle those folder name changes programmatically (this patch is for release 15h, simply copy the attached file to the conn distribution folder overwriting the file with the same name there). CONN already had some search/replace capability which was used, for example, to allow you to enter through the GUI just the location of the first subject/session missing datafile and then CONN would automatically generate a search/replace pattern from the entered info and attempt to apply that same transformation to the rest of the missing subjects/sessions datafiles. What I added in this patch is then just the ability to: 1) programmatically define those search/replace patterns as well; and 2) have CONN "remember" those search/replace patterns across different projects (e.g. when merging multiple projects).
For example, if your data in the HPC cluster was located in a /tmp/data folder and in your main system it is located in a /projects/myproject/data folder, then after rsyncing your project back to your main system, you could now type (or include in your scripts):
conn_updatefilepaths('init', '/tmp/data','/projects/myproject/data');
right before loading and/or merging your project files in CONN. This will tell CONN that at any future time (within the current Matlab session), when CONN is loading a project or merging several projects, if it finds some missing files in any /tmp/data[SOMETHINGELSE] location, it should first try to see if those files exist in the location /projects/myproject/data[SOMETHINGELSE], and if they can be found there then CONN will automatically fix those references without prompting you to locate those files (if they do not exist in the new location or if the missing files do not match the /tmp/data* pattern CONN will still ask you to locate those missing files as it normally does).
In general, the syntax is:
conn_updatefilepaths('init', root_searchstring, root_replacestring)
where root_searchstring and root_replacestring may be strings (for a single folder name change), or cell arrays of strings (for multiple folder name changes), to define programmatically potential search/replace patterns.
Just for reference, you may also use either the syntax:
conn_updatefilepaths('init',{},{});
or equivalently the syntax:
conn_updatefilepaths('hold','off');
to "forget" from this point on any potential search/replace patterns that you may have entered before. Or the syntax:
conn_updatefilepaths('hold','on');
to have CONN "remember" from this point on any potential search/replace patterns that you may implicitly define through the GUI (when prompted to select the location of a missing file) without actually suggesting any initial search/replace patterns programmatically to begin with. This is useful, for example, if you load a project and manually fix some folder reference when prompted by the GUI, and after that you load a second project and you do not want to have to enter exactly the same folder-name change that you already fixed in the previous project.
Hope this helps and let me know if you run into any issues and/or if you would like me to further clarify any of the above.
Best
Alfonso
Originally posted by Bob Kraft:
I am looking for some advice on how to
effectively use our HPC cluster. Our HPC cluster only has temporary
storage for data processing. It is intended to be used for batch
processing and not for interactive data processing or data
visualization. Our data is stored on a separate computer and
network disk space. With this setup I was planning to do the
following
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
1) rsync data from network disk to HPC cluster temporary storage
2) Process each subject indivudally via conn_batch (preprocessing, setup, denoising, and first level analysis)
3) rsync_data from HPC cluster temporary storage back to our permanent network disk
4) merge individual subjects into a single project
5) perform second level analysis.
For 64 subject with pre and post scans, I am able to do steps 1-3 in about an hour (15 minutes setup, 45 minutes computer time). And although this processing stream may not be ideal it works for me.
The problem I am have is with merging the files. Since individual folders created by CONN have moved Conn needs to confirm the location of the individual's data and the location of the ROIs. Doing this manually takes me about 1-2 minutes per subject. This adds about 1 to 2 hours of just clicking buttons.
Is there anyway to automate this process in CONN?
Thanks for your help,
Bob
Threaded View
| Title | Author | Date |
|---|---|---|
| Bob Kraft | Feb 15, 2016 | |
| Benson Stevens | Mar 21, 2019 | |
| Alfonso Nieto-Castanon | Feb 17, 2016 | |
| Jalmar Teeuw | Feb 15, 2016 | |
| Bob Kraft | Feb 17, 2016 | |
