help > RE: HPC - SLURM - Parralellisation with conn
Dec 18, 2020  04:12 PM | Alfonso Nieto-Castanon - Boston University
RE: HPC - SLURM - Parralellisation with conn
Hi,

Two similar errors in SLURM make me wonder whether there is something I can change in the code to avoid similar issues for other users as well. If you do not mind helping me debug this, that would be very helpful. To get started could you please send me the contents of your qlog/201218160009992 or equivalent folder? (it should contain an "info.mat" file as well as a number of .stdout .stderr and .stdlog files, please zip those into a single file and share it either here or through email).

Also @sat2020, in addition to the contents of the .qlog folder, if you are using Brown's Oscar CCV could you please share the contents of the "matlab" and "matlab-threaded" wrappers (I see from the documentation in https://docs.ccv.brown.edu/oscar/matlab/matlab-gui that "matlab" indeed appears to be a wrapper, so knowing what it does explicitly would help me figure out how to best fix the -nojvm issue in your environment -as an additional option beyond using the pre-compiled version-). In any way, I will also build a new pre-compiled version of 20b for Linux in the next few days just to offer another alternative. 

Thanks you very much
Alfonso
Originally posted by sat2020:
Hello,

I was getting similar "-nojvm" errors with SLURM as well the past few weeks. We tried a few ways of starting Matlab differently but the only way our IT found to fix it was to install the 18b Linux standalone version of Conn on the HPC and now parallelization works with that version. I also tend to run 1 node per participant.


Originally posted by sophieb:
Hello,

I am trying to do lesion network mapping using the data from the HCP (N=1018), and 31 lesions as seeds. I am doing this using a HPC (SLURM). 

My code runs well on 1 or several participants.
I configured the HPC option on conn as explained on yur website.
But as soon as I try to use the conn parralelisation option (for 4 subjects, 31 lesions, BATCH.parallel.N=24), the jobs fail. Please find my code enclosed. The job I submitted and the output below.

Could you help me?
Best regards,
Sophie

------------
JOB SUBMISSION:
#!/bin/bash -l
#SBATCH --chdir /scratch/betka
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 4
#SBATCH --nodes 1
#SBATCH --mem 20G
#SBATCH --time 08:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=xxxx

echo STARTING AT $(date)
module purge
module load matlab
cd /home/betka/Script_SCITAS/Preprocessing/Analyses_Step1
matlab -nodesktop -nojvm -batch 'run LesionNetworkMappingSoso_SCITAS_17122020v3.m'
echo FINISHING AT $(date)




OUTPUT
STARTING AT Fre Dez 18 15:59:42 CET 2020
[Warning: Directory already exists.]
[> In LesionNetworkMappingSoso_SCITAS_17122020v3 (line 11)
In run (line 91)]
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4

ans =
struct with fields:
names: {1x31 cell}
dimensions: {1x31 cell}
deriv: {1x31 cell}
saved /scratch/betka/18122020Analysis_3_4
saved /scratch/betka/18122020Analysis_3_4
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0001201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0002201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0003201218160009992.sh failed
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh submitted
/scratch/betka/18122020Analysis_3_4.qlog/201218160009992/node.0004201218160009992.sh failed
{Error using figure
This functionality is no longer supported under the -nojvm startup option. For
more information, see "Changes to -nojvm Startup Option" in the MATLAB Release
Notes. To view the release note in your system browser, run
web('http://www.mathworks.com/help/matlab/release-notes.html#btsurqv-6',
'-browser').
Error in conn_jobmanager>conn_jobmanager_gui (line 1222)
handles.hfig=figure('units','norm','position',[.3 .3 .4 .6],'name',sprintf('job
manager
(%s)',conn_jobmanager('getprofile')),'numbertitle','off','menubar','none','color','w','visible',visible,'handlevisibility','callback','tag',tag);
Error in conn_jobmanager (line 763)
[info,ok]=conn_jobmanager_gui(info,{},{},'nogui');
Error in conn_batch (line 1653)
if
~isfield(batch.parallel,'immediatereturn')||~batch.parallel.immediatereturn,
conn_jobmanager('waitfor',info); end
Error in LesionNetworkMappingSoso_SCITAS_17122020v3 (line 139)
conn_batch(BATCH);
Error in run (line 91)
evalin('caller', strcat(script, ';'));
}
FINISHING AT Fre Dez 18 16:00:15 CET 2020

Threaded View

TitleAuthorDate
sophieb Dec 18, 2020
sophieb Jan 15, 2021
sophieb Jan 14, 2021
sophieb Jan 13, 2021
sophieb Jan 11, 2021
Alfonso Nieto-Castanon Jan 11, 2021
Alfonso Nieto-Castanon Jan 11, 2021
sophieb Jan 12, 2021
Alfonso Nieto-Castanon Jan 12, 2021
sophieb Jan 12, 2021
Alfonso Nieto-Castanon Jan 26, 2021
sophieb Jan 12, 2021
sophieb Jan 11, 2021
sat2020 Dec 18, 2020
RE: HPC - SLURM - Parralellisation with conn
Alfonso Nieto-Castanon Dec 18, 2020
sophieb Dec 18, 2020
Alfonso Nieto-Castanon Dec 18, 2020
sophieb Dec 18, 2020
Alfonso Nieto-Castanon Dec 18, 2020
sophieb Dec 19, 2020
Alfonso Nieto-Castanon Dec 19, 2020
sophieb Dec 21, 2020
Alfonso Nieto-Castanon Dec 21, 2020
sophieb Jan 2, 2021
sophieb Jan 8, 2021
sophieb Dec 22, 2020