help
help > RE: HPC - SLURM - Parralellisation with conn
Jan 26, 2021 01:01 AM | Alfonso Nieto-Castanon - Boston University
RE: HPC - SLURM - Parralellisation with conn
Dear Sophie,
Let's try to solve one thing at a time. First, the BATCH_12012021.mat contains the details of the batch structure that (I imagine) your Lesion*.m script is creating, but the field "filename" of that batch structure (which should point to the name of your CONN project) is also pointing to the same file "BATCH_12012021.mat". It is not a good idea to name your CONN project with the same name as another .mat file which: a) already exists; and b) happens to contain the batch that will in turn create your conn project. Could you please re-send me your Lesion*.m file? (the version that I have does not do this, so I imagine you are working with a different version?)
And regarding the memory limitations, the node that runs the conn_batch command (part of your Lesion*.m script) should not require any significant memory at all (definitely not over a few Gb), since CONN is only submitting jobs to your cluster there. If your IT team is reporting that this node exceeded the allocated memory, it must be some strange recursive-loop where Matlab is freezing while draining resources, so I would suggest to try to run the Lesion*.m file interactively and debug it (just run it one step at a time) to see where it freezes
Hope this helps
Alfonso
Originally posted by sophieb:
Let's try to solve one thing at a time. First, the BATCH_12012021.mat contains the details of the batch structure that (I imagine) your Lesion*.m script is creating, but the field "filename" of that batch structure (which should point to the name of your CONN project) is also pointing to the same file "BATCH_12012021.mat". It is not a good idea to name your CONN project with the same name as another .mat file which: a) already exists; and b) happens to contain the batch that will in turn create your conn project. Could you please re-send me your Lesion*.m file? (the version that I have does not do this, so I imagine you are working with a different version?)
And regarding the memory limitations, the node that runs the conn_batch command (part of your Lesion*.m script) should not require any significant memory at all (definitely not over a few Gb), since CONN is only submitting jobs to your cluster there. If your IT team is reporting that this node exceeded the allocated memory, it must be some strange recursive-loop where Matlab is freezing while draining resources, so I would suggest to try to run the Lesion*.m file interactively and debug it (just run it one step at a time) to see where it freezes
Hope this helps
Alfonso
Originally posted by sophieb:
Dear Alfonso, to attache the zip file is leading
to an error on the forum, as it is too big?
I took the freedom to send you a swiss transfer with the 2 .mat on your gmail address or you can dl it here https://www.swisstransfer.com/d/ebfa7328...
best,
s.
I took the freedom to send you a swiss transfer with the 2 .mat on your gmail address or you can dl it here https://www.swisstransfer.com/d/ebfa7328...
best,
s.
Threaded View
Title | Author | Date |
---|---|---|
sophieb | Dec 18, 2020 | |
sophieb | Jan 15, 2021 | |
sophieb | Jan 14, 2021 | |
sophieb | Jan 13, 2021 | |
sophieb | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 12, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 26, 2021 | |
sophieb | Jan 12, 2021 | |
sophieb | Jan 11, 2021 | |
sat2020 | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 19, 2020 | |
Alfonso Nieto-Castanon | Dec 19, 2020 | |
sophieb | Dec 21, 2020 | |
Alfonso Nieto-Castanon | Dec 21, 2020 | |
sophieb | Jan 2, 2021 | |
sophieb | Jan 8, 2021 | |
sophieb | Dec 22, 2020 | |