help
help > RE: HPC - SLURM - Parralellisation with conn
Jan 13, 2021 12:01 PM | sophieb
RE: HPC - SLURM - Parralellisation with conn
Dear Alfonso,
I tried to gather info related to the bus error to our HPC team. (As I am not submitting a job direclty, I cant have info related to what happened).
Here is their reply:
"I looked at the system logs and I think this happened at 11:02 yesterday 12/01:
Jan 12 11:02:24 fidis abrt-hook-ccpp: Process 31401 (MATLAB) of user 204501 killed by SIGBUS - dumping core
Jan 12 11:02:24 fidis abrt-hook-ccpp: Process 30303 (MATLAB) of user 204501 killed by SIGBUS - ignoring (repeated crash)
At the exact same time I see in our monitoring that someone used all the swap of the system (this means someone used more than the memory available to themselves).
In the login nodes there is a limit of 8GB of memory per user. This is in place to protect the login node as it is a shared resource.
I suspect you reached the 8GB limit, then used swap and when that run out you got a bus error.
You'll need to connect to a compute node to run your script. I suggest you use a node in the build or debug partition.
After the login to fidis simply run something like:
Sinteract -p build -c 10 -m 20G -t 02:00:00
This should normally give you an interactive session in a node rather quickly (as these nodes are usually free). "
I am currently trying this using the .m script option (not the already created .mat). With Sinteract -p build -c 10 -m 30G -t 02:00:00, nothing is happening, meaning I will get the bus error if I wait.
Edit
I ran my .m script with Sinteract -p build -c 10 -m 100G -t 02:00:00 and the analyses seemed to have worked. I think it would have worked with less than 100G, it was just taking a bit more time I guess.
The problem was related to the memory limit on the login node.
Thanks
sophie
I tried to gather info related to the bus error to our HPC team. (As I am not submitting a job direclty, I cant have info related to what happened).
Here is their reply:
"I looked at the system logs and I think this happened at 11:02 yesterday 12/01:
Jan 12 11:02:24 fidis abrt-hook-ccpp: Process 31401 (MATLAB) of user 204501 killed by SIGBUS - dumping core
Jan 12 11:02:24 fidis abrt-hook-ccpp: Process 30303 (MATLAB) of user 204501 killed by SIGBUS - ignoring (repeated crash)
At the exact same time I see in our monitoring that someone used all the swap of the system (this means someone used more than the memory available to themselves).
In the login nodes there is a limit of 8GB of memory per user. This is in place to protect the login node as it is a shared resource.
I suspect you reached the 8GB limit, then used swap and when that run out you got a bus error.
You'll need to connect to a compute node to run your script. I suggest you use a node in the build or debug partition.
After the login to fidis simply run something like:
Sinteract -p build -c 10 -m 20G -t 02:00:00
This should normally give you an interactive session in a node rather quickly (as these nodes are usually free). "
I am currently trying this using the .m script option (not the already created .mat). With Sinteract -p build -c 10 -m 30G -t 02:00:00, nothing is happening, meaning I will get the bus error if I wait.
Edit
I ran my .m script with Sinteract -p build -c 10 -m 100G -t 02:00:00 and the analyses seemed to have worked. I think it would have worked with less than 100G, it was just taking a bit more time I guess.
The problem was related to the memory limit on the login node.
Thanks
sophie
Threaded View
Title | Author | Date |
---|---|---|
sophieb | Dec 18, 2020 | |
sophieb | Jan 15, 2021 | |
sophieb | Jan 14, 2021 | |
sophieb | Jan 13, 2021 | |
sophieb | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
Alfonso Nieto-Castanon | Jan 11, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 12, 2021 | |
sophieb | Jan 12, 2021 | |
Alfonso Nieto-Castanon | Jan 26, 2021 | |
sophieb | Jan 12, 2021 | |
sophieb | Jan 11, 2021 | |
sat2020 | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 18, 2020 | |
Alfonso Nieto-Castanon | Dec 18, 2020 | |
sophieb | Dec 19, 2020 | |
Alfonso Nieto-Castanon | Dec 19, 2020 | |
sophieb | Dec 21, 2020 | |
Alfonso Nieto-Castanon | Dec 21, 2020 | |
sophieb | Jan 2, 2021 | |
sophieb | Jan 8, 2021 | |
sophieb | Dec 22, 2020 | |