help
help > RE: HPC - SLURM - Parralellisation with conn
Dec 18, 2020 04:12 PM | Alfonso Nieto-Castanon - Boston University
RE: HPC - SLURM - Parralellisation with conn
Hi,
Two similar errors in SLURM make me wonder whether there is something I can change in the code to avoid similar issues for other users as well. If you do not mind helping me debug this, that would be very helpful. To get started could you please send me the contents of your qlog/201218160009992 or equivalent folder? (it should contain an "info.mat" file as well as a number of .stdout .stderr and .stdlog files, please zip those into a single file and share it either here or through email).
Also @sat2020, in addition to the contents of the .qlog folder, if you are using Brown's Oscar CCV could you please share the contents of the "matlab" and "matlab-threaded" wrappers (I see from the documentation in https://docs.ccv.brown.edu/oscar/matlab/matlab-gui that "matlab" indeed appears to be a wrapper, so knowing what it does explicitly would help me figure out how to best fix the -nojvm issue in your environment -as an additional option beyond using the pre-compiled version-). In any way, I will also build a new pre-compiled version of 20b for Linux in the next few days just to offer another alternative.
Thanks you very much
Alfonso
Originally posted by sat2020:
Two similar errors in SLURM make me wonder whether there is something I can change in the code to avoid similar issues for other users as well. If you do not mind helping me debug this, that would be very helpful. To get started could you please send me the contents of your qlog/201218160009992 or equivalent folder? (it should contain an "info.mat" file as well as a number of .stdout .stderr and .stdlog files, please zip those into a single file and share it either here or through email).
Also @sat2020, in addition to the contents of the .qlog folder, if you are using Brown's Oscar CCV could you please share the contents of the "matlab" and "matlab-threaded" wrappers (I see from the documentation in https://docs.ccv.brown.edu/oscar/matlab/matlab-gui that "matlab" indeed appears to be a wrapper, so knowing what it does explicitly would help me figure out how to best fix the -nojvm issue in your environment -as an additional option beyond using the pre-compiled version-). In any way, I will also build a new pre-compiled version of 20b for Linux in the next few days just to offer another alternative.
Thanks you very much
Alfonso
Originally posted by sat2020:
Hello,
I was getting similar "-nojvm" errors with SLURM as well the past few weeks. We tried a few ways of starting Matlab differently but the only way our IT found to fix it was to install the 18b Linux standalone version of Conn on the HPC and now parallelization works with that version. I also tend to run 1 node per participant.
Originally posted by sophieb:
I was getting similar "-nojvm" errors with SLURM as well the past few weeks. We tried a few ways of starting Matlab differently but the only way our IT found to fix it was to install the 18b Linux standalone version of Conn on the HPC and now parallelization works with that version. I also tend to run 1 node per participant.
Originally posted by sophieb:
Hello,
I am trying to do lesion network mapping using the data from the HCP (N=1018), and 31 lesions as seeds. I am doing this using a HPC (SLURM).
My code runs well on 1 or several participants.
I configured the HPC option on conn as explained on yur website.
But as soon as I try to use the conn parralelisation option (for 4 subjects, 31 lesions, BATCH.parallel.N=24), the jobs fail. Please find my code enclosed. The job I submitted and the output below.
Could you help me?
Best regards,
Sophie
------------
JOB SUBMISSION:
#!/bin/bash -l
#SBATCH --chdir /scratch/betka
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 4
#SBATCH --nodes 1
#SBATCH --mem 20G
#SBATCH --time 08:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=xxxx
echo STARTING AT $(date)
module purge
module load matlab
cd /home/betka/Script_SCITAS/Preprocessing/Analyses_Step1
matlab -nodesktop -nojvm -batch 'run LesionNetworkMappingSoso_SCITAS_17122020v3.m'
echo FINISHING AT $(date)
OUTPUT
STARTING AT Fre Dez 18 15:59:42 CET 2020
[Warning: Directory already exists.]
[> In LesionNetworkMappingSoso_SCITAS_17122020v3 (line 11)
In run (line 91)]
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4
ans =
struct with fields:
names: {1x31 cell}
dimensions: {1x31 cell}
deriv: {1x31 cell}
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh failed
{Error using figure
This functionality is no longer supported under the -nojvm startup option. For
more information, see "Changes to -nojvm Startup Option" in the MATLAB Release
Notes. To view the release note in your system browser, run
web('http://www.mathworks.com/help/matlab/release-notes.html#btsurqv-6',
'-browser').
Error in conn_jobmanager>conn_jobmanager_gui (line 1222)
handles.hfig=figure('units','norm','position',[.3 .3 .4 .6],'name',sprintf('job
manager
(%s)',conn_jobmanager('getprofile')),'numbertitle','off','menubar','none','color','w','visible',visible,'handlevisibility','callback','tag',tag);
Error in conn_jobmanager (line 763)
[info,ok]=conn_jobmanager_gui(info,{},{},'nogui');
Error in conn_batch (line 1653)
if
~isfield(batch.parallel,'immediatereturn')||~batch.parallel.immediatereturn,
conn_jobmanager('waitfor',info); end
Error in LesionNetworkMappingSoso_SCITAS_17122020v3 (line 139)
conn_batch(BATCH);
Error in run (line 91)
evalin('caller', strcat(script, ';'));
}
FINISHING AT Fre Dez 18 16:00:15 CET 2020
I am trying to do lesion network mapping using the data from the HCP (N=1018), and 31 lesions as seeds. I am doing this using a HPC (SLURM).
My code runs well on 1 or several participants.
I configured the HPC option on conn as explained on yur website.
But as soon as I try to use the conn parralelisation option (for 4 subjects, 31 lesions, BATCH.parallel.N=24), the jobs fail. Please find my code enclosed. The job I submitted and the output below.
Could you help me?
Best regards,
Sophie
------------
JOB SUBMISSION:
#!/bin/bash -l
#SBATCH --chdir /scratch/betka
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 4
#SBATCH --nodes 1
#SBATCH --mem 20G
#SBATCH --time 08:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=xxxx
echo STARTING AT $(date)
module purge
module load matlab
cd /home/betka/Script_SCITAS/Preprocessing/Analyses_Step1
matlab -nodesktop -nojvm -batch 'run LesionNetworkMappingSoso_SCITAS_17122020v3.m'
echo FINISHING AT $(date)
OUTPUT
STARTING AT Fre Dez 18 15:59:42 CET 2020
[Warning: Directory already exists.]
[> In LesionNetworkMappingSoso_SCITAS_17122020v3 (line 11)
In run (line 91)]
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4
ans =
struct with fields:
names: {1x31 cell}
dimensions: {1x31 cell}
deriv: {1x31 cell}
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh failed
{Error using figure
This functionality is no longer supported under the -nojvm startup option. For
more information, see "Changes to -nojvm Startup Option" in the MATLAB Release
Notes. To view the release note in your system browser, run
web('http://www.mathworks.com/help/matlab/release-notes.html#btsurqv-6',
'-browser').
Error in conn_jobmanager>conn_jobmanager_gui (line 1222)
handles.hfig=figure('units','norm','position',[.3 .3 .4 .6],'name',sprintf('job
manager
(%s)',conn_jobmanager('getprofile')),'numbertitle','off','menubar','none','color','w','visible',visible,'handlevisibility','callback','tag',tag);
Error in conn_jobmanager (line 763)
[info,ok]=conn_jobmanager_gui(info,{},{},'nogui');
Error in conn_batch (line 1653)
if
~isfield(batch.parallel,'immediatereturn')||~batch.parallel.immediatereturn,
conn_jobmanager('waitfor',info); end
Error in LesionNetworkMappingSoso_SCITAS_17122020v3 (line 139)
conn_batch(BATCH);
Error in run (line 91)
evalin('caller', strcat(script, ';'));
}
FINISHING AT Fre Dez 18 16:00:15 CET 2020
Threaded View
Title | Author | Date |
---|---|---|
sophieb | Dec 18, 2020 | |
sophieb | Jan 15, 2021 | |
sophieb | Jan 14, 2021 | |
sophieb | Jan 13, 2021 | |
sophieb | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 12, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 26, 2021 | |
sophieb | Jan 12, 2021 | |
sophieb | Jan 11, 2021 | |
sat2020 | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 19, 2020 | |
Alfonso Nieto-Castanon | Dec 19, 2020 | |
sophieb | Dec 21, 2020 | |
Alfonso Nieto-Castanon | Dec 21, 2020 | |
sophieb | Jan 2, 2021 | |
sophieb | Jan 8, 2021 | |
sophieb | Dec 22, 2020 | |