I am setting up an mri_reface pipeline on our SGE cluster. I've been experimenting with memory limits and multithreading. I'll share my results from my initial tests. If Chris et al. want to weigh in with any insider recommendations, I'd appreciate it, and I hope it may be useful to others.
I am using an apptainer image created from the mri_reface docker image.
Memory:
4 GB: stopped with MATLAB runtime out of memory error
8 GB: completed refacing but skipped generating the PNG snapshots
Generating renders for QC use
deNoseWrap failed. Skipping deNoseWrap.
#
# There is insufficient memory
for the Java Runtime Environment to
continue.
# Native memory allocation
(malloc) failed to allocate 24 bytes for AllocateHeap
12 GB: ran with no memory errors or warnings
CPU
Single threaded:
OMP:
Warning #96: Cannot form a team with 4 threads, using 1
instead.
OMP: Hint Consider unsetting
KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT,
and OMP_THREAD_LIMIT (if any are set).
Using 4 threads (SGE submit script option `-pe smp 4`)
OMP:
Warning #96: Cannot form a team with 20 threads, using 4
instead.
OMP: Hint Consider unsetting
KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT,
and OMP_THREAD_LIMIT (if any are set).
I note the docs say there's not much gain to multithreading so I may ignore these warnings. I have not tested run time for different numbers of threads yet.
I agree that there's not much to gain in multithreading. If you watch a process graph through a run, most of the runtime is at 1.0 or less. In-development versions will have considerable performance improvements without multithreading, though.
Here's our stats for the past 6mo of defacing jobs on our cluster:
11374 jobs between 20251029 and 20260429.
MaxVMSize(GB)
Mean 6.23
Median 5.46
p95 21.05
p99 24.81
However, these are running from source code (not deployed/compiled) and not containerized, so it may be different for you. We're running a variety of image types and the higher usages are typically full-dynamic PET with 40+ volumes. For typical images with 1-5 volumes, I'd say 10G seems reasonable.
I think you really want the png snapshots for QC purposes, but they definitely increase requirements and dependencies.
Chris
Useful info, thank you. When I have more info after running an array of jobs I'll update the thread.
I agree there's probably an overhead to running in a singularity container.
