help
help > RE: HPC - SLURM - Parralellisation with conn
Dec 19, 2020 09:12 PM | Alfonso Nieto-Castanon - Boston University
RE: HPC - SLURM - Parralellisation with conn
Hi Sophie,
If the job was stopped without any apparent reason, the most likely cause is that the job-scheduler in your cluster environment killed the job because it exceeded the allocated resources (typically either it exceeded the allocated time or it exceeded the allocated memory). To fix that you simply need to "request" more memory and longer times for your jobs. You may do that in CONN's gui 'tools. HPC options. Configuration' menu, selecting the profile named 'Slurm computer cluster', and then adding "in-line additional submit options" box the text:
-t 12:00:00 --mem=8Gb
(the above line will request 12 hours and 8Gb per job, that should typically suffice but feel free to play with those values if needed)
and then clicking 'save' (all) to save that configuration for future jobs as well. If you prefer not to use the GUI you can also do the same thing from Matlab's command line using the command:
conn_jobmanager options cmd_submitoptions '-t 12:00:00 --mem=8Gb' saveall
Let me know if that seems to fix this issue (and if does not, please give me more details about what cluster environment your are using since may of these limitations and policies vary from place to place)
Hope this helps
Alfonso
Originally posted by sophieb:
If the job was stopped without any apparent reason, the most likely cause is that the job-scheduler in your cluster environment killed the job because it exceeded the allocated resources (typically either it exceeded the allocated time or it exceeded the allocated memory). To fix that you simply need to "request" more memory and longer times for your jobs. You may do that in CONN's gui 'tools. HPC options. Configuration' menu, selecting the profile named 'Slurm computer cluster', and then adding "in-line additional submit options" box the text:
-t 12:00:00 --mem=8Gb
(the above line will request 12 hours and 8Gb per job, that should typically suffice but feel free to play with those values if needed)
and then clicking 'save' (all) to save that configuration for future jobs as well. If you prefer not to use the GUI you can also do the same thing from Matlab's command line using the command:
conn_jobmanager options cmd_submitoptions '-t 12:00:00 --mem=8Gb' saveall
Let me know if that seems to fix this issue (and if does not, please give me more details about what cluster environment your are using since may of these limitations and policies vary from place to place)
Hope this helps
Alfonso
Originally posted by sophieb:
Hello,
I amended my script as suggested.
The parralellisation seemed to have worked (4 .mat -one for each participant -were produced as I had 24 cores and 4 participants). However, the analyses stopped without being done, not sure why. Please find the zipped filed enclosed.
Thanks a lot,
Sophie
I amended my script as suggested.
The parralellisation seemed to have worked (4 .mat -one for each participant -were produced as I had 24 cores and 4 participants). However, the analyses stopped without being done, not sure why. Please find the zipped filed enclosed.
Thanks a lot,
Sophie
Threaded View
Title | Author | Date |
---|---|---|
sophieb | Dec 18, 2020 | |
sophieb | Jan 15, 2021 | |
sophieb | Jan 14, 2021 | |
sophieb | Jan 13, 2021 | |
sophieb | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 12, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 26, 2021 | |
sophieb | Jan 12, 2021 | |
sophieb | Jan 11, 2021 | |
sat2020 | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 19, 2020 | |
Alfonso Nieto-Castanon | Dec 19, 2020 | |
sophieb | Dec 21, 2020 | |
Alfonso Nieto-Castanon | Dec 21, 2020 | |
sophieb | Jan 2, 2021 | |
sophieb | Jan 8, 2021 | |
sophieb | Dec 22, 2020 | |