help > Multi-dataset pipeline
Mar 5, 2026  03:03 PM | shiinsaad
Multi-dataset pipeline

Hi Alfonso,


I’m building a resting-state functional connectivity mega-analysis that combines multiple independent rs-fMRI datasets, and I’m now starting the preprocessing stage. I would really appreciate your guidance as I move forward.


Across datasets, there is substantial heterogeneity in both (i) study-level variables (number of subjects; number of visits per subject [1–4]; condition availability; demographics; study design [parallel vs crossover]; and stimulation parameters including active vs sham, stimulation site, intensity, duration, electrode shape/size, and potentially a “number of prior sessions” variable to capture consolidation/time effects for studies with repeated active visits) and (ii) acquisition-level variables (TR, slice timing information/slice order, and native voxel size).


I’m preprocessing using CONN in MATLAB (batch mode), aiming to follow CONN’s default MNI preprocessing pipeline as closely as possible.


Data organization


I keep a consistent structure across datasets, e.g.:
Study (01_Name) / Subject (S01) / Visit (V1) / anat and func (PRE, DURING, POST)


Each subject can have 1–4 visits (V1…V4). Each visit typically has one T1 and one 4D resting-state NIfTI per condition: PRE and POST, and in some datasets also DURING.


What I am doing in my batch script (so far tested on one dataset)


1. loop subjects within a dataset
I run the batch one dataset at a time, looping over all subjects in that dataset.


2. auto-detect available visits for the subject
For each subject, the script scans the subject metadata and automatically identifies available visit labels (V1…V4), then sorts them in visit order.


3. Build CONN sessions as visit × condition
For each detected visit, look for functional runs corresponding to each condition in a fixed order:
PRE -> DURING (if available) -> POST
Each available run becomes one CONN “session”, so the final session list is:
V1_PRE, V1_DURING, V1_POST, V2_PRE, … (depending on availability).


My intent was to keep a transparent mapping between acquisition blocks and CONN sessions. In your opinion, for a multi-visit setup, is it better to:



  • keep conditions identical across visits (PRE/DURING/POST),

  • or define visit-specific conditions (V1_PRE, V2_PRE, etc.)?


4. one CONN project per subject (multi-session)


For each subject, I create one CONN project containing all that subject’s sessions (sessions = visit × condition).


This was mainly to simplify handling variable session counts and reduce failures when subjects are “incomplete” relative to others. However, I’m unsure if this is a good design for later group-level analyses and contrasts. Would you recommend instead:



  • one project per dataset (all subjects in that dataset), or

  • a single project for the full mega-analysis?


5. Structural assignment


At the moment, I provide one T1 per subject (selected from available visits, usually the earliest valid T1).


I initially tried using one T1 per visit/session, but it produced repeated segmentation outputs across visits (multiple c0/c1/c2 generations), and in the CONN GUI the structural preview looked distorted/tilted compared to the original T1.
From a best-practice perspective for multi-visit data: should I insist on using the visit-matched T1 (one per visit), or is one T1 per subject acceptable/preferable in CONN?


6. TR handling
The script reads TR from the JSON (RepetitionTime) when available; otherwise it falls back to a TR value from an external table. It also checks within-subject consistency across sessions.


In one case, a single session had a different TR in JSON (e.g., 2.5 s vs 3.0 s in other sessions). The JSONs were generated from DICOM headers and slice timing info was missing for that Philips scanner.
I’m not sure if this reflects conversion/metadata issues or true acquisition differences that should be modeled explicitly.


7. STC strategy
If SliceTiming is available in JSON, STC is run using the BIDS SliceTiming
If SliceTiming is missing, the script falls back to a user-defined slice order (from an excel)
If neither is available, STC is skipped for that subject (and recorded in the log)


8. preprocessing
I run an explicit steps list that matches CONN’s default MNI pipeline order, with functional_removescans explicitly placed first.


In practice:



  • If STC is available: removescans -> functional centering -> realign/unwarp -> slice-timing -> ART -> direct functional segmentation+normalization -> smoothing -> structural centering -> structural segmentation+normalization

  • If STC is not available: same order, but without the slice-timing step.


------------------------------


A) Main issue I’m currently debugging: intermittent “mean functional file not found”


When running this multi-session setup through the default-MNI-style pipeline with mean-based coregistration enabled, I intermittently get:
“Mean functional file not found (associated with functional data … au*.nii)”


It generates a mean image for the first session (V1_PRE), but then fails to find/generate the expected mean for later sessions (e.g., V1_DURING/POST, V2 sessions).
Have you seen this behavior before, and do you have any idea what in the multi-session setup could trigger it (e.g., project structure, session ordering, conditions, or something else)?


I would like to keep mean-based coregistration (i.e., not switch to coregtomean=0). As a temporary workaround, I handle this specific failure in the script as follows:



  • I run conn_batch in a try/catch.

  • If the error message contains “Mean functional file not found (associated with functional data … au*.nii)”, I parse the missing au*.nii path from the message.

  • generate mean directly from that 4D file using SPM (mean across volumes, saved in the same folder).

  • retry conn_batch once


This resolves the failure. From the output files, it seems like CONN (with direct normalization) is using V1_PRE “reference mean functional” to estimate the normalization, and then it applies that transform to all sessions. In that case, those later sessions do not have session-specific ART-related outputs (e.g., y_art_mean_au… and related files), I am not sure if this is correct.


Related questions:



  • Is direct functional segmentation+normalization the recommended choice here, or would indirect normalization be better practice in a multi-visit context (and why)?

  • Do you recommend one T1 per subject or one T1 per visit when each visit has its own T1?


------------------------------


B) Smoothing question


Most datasets have native voxel sizes around ~3–3.75 mm, but I also have one higher-resolution dataset (1×1×2 mm). We already discussed this in a previous post, however, the study author suggested that an 8 mm kernel may over-smooth their data. In your opinion, should I:



  • keep 8 mm for all datasets for maximal consistency, and later model site/dataset effects (e.g., covariates/ComBat), or

  • use a smaller kernel (e.g., 6 mm) for all datasets, or

  • use different kernels by dataset (e.g., 6 mm for the high-resolution dataset and 8 mm for the others), and then handle this difference via harmonization/covariates, or

  • run two pipelines (6 mm and 8 mm) and compare the stability of results?


------------------------------


C) If you think my project organization (one project per subject vs per dataset) or the way I define sessions/conditions needs revision, I would be very grateful for your recommendations on how to structure CONN batch preprocessing for this multi-dataset mega-analysis.


------------------------------


D) More broadly, I’d appreciate any critique of my current strategy (design choices, weak points, ...) and how you would recommend I proceed. 
This is my first attempt working with fMRI and CONN, so I may be missing best practices for handling study-level parameters (active/sham, site, montage, dose, etc.). My assumption is that most of those variables will be handled after preprocessing, unless you think they should influence preprocessing decisions.


------------------------------


Thank you very much,


Best regards,
Shiva

  Edited (16 hours ago)