open-discussion > RE: Duplicate data in ABIDE
Dec 19, 2019  07:12 AM | Yaroslav Halchenko
RE: Duplicate data in ABIDE
FWIW confirming -- they are not bit identical (difference in size in 1 byte) but otherwise data seems to be the same:
```
$> git clone https://github.com/ReproNim/openneurolab...
$> cd openneurolab-metasearch-dataset

$> ls -ld ./abide_initiative/sub-50279/ses-1/T1_rep-0.mgz ./abide_initiative/sub-50286/ses-1/T1_rep-0.mgz
lrwxrwxrwx 1 yoh yoh 137 Dec 19 10:42 ./abide_initiative/sub-50279/ses-1/T1_rep-0.mgz -> ../../../.git/annex/objects/jf/1K/MD5E-s2305710--77704a0c155603bc634b8b391f7736a5.mgz/MD5E-s2305710--77704a0c155603bc634b8b391f7736a5.mgz
lrwxrwxrwx 1 yoh yoh 137 Dec 19 10:42 ./abide_initiative/sub-50286/ses-1/T1_rep-0.mgz -> ../../../.git/annex/objects/xQ/6P/MD5E-s2305711--f331d20a1f5b7c2ceaad6c0eccfe5f92.mgz/MD5E-s2305711--f331d20a1f5b7c2ceaad6c0eccfe5f92.mgz

$> datalad get ./abide_initiative/sub-50279/ses-1/T1_rep-0.mgz ./abide_initiative/sub-50286/ses-1/T1_rep-0.mgz
get(ok): abide_initiative/sub-50286/ses-1/T1_rep-0.mgz (file) [from web...]
get(ok): abide_initiative/sub-50279/ses-1/T1_rep-0.mgz (file) [from web...]
action summary:
get (ok: 2)

$> nib-diff ./abide_initiative/sub-50279/ses-1/T1_rep-0.mgz ./abide_initiative/sub-50286/ses-1/T1_rep-0.mgz
These files are identical.
```

in my case files come from fcp-indi s3 bucket:
```
$> git annex whereis ./abide_initiative/sub-50279/ses-1/T1_rep-0.mgz ./abide_initiative/sub-50286/ses-1/T1_rep-0.mgz
whereis abide_initiative/sub-50279/ses-1/T1_rep-0.mgz (3 copies)
00000000-0000-0000-0000-000000000001 -- web
9ed025be-5276-4e8a-a1fc-d82a04514147 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/labs/openneurolab/metasearch
d9a685f3-b192-412c-b798-1a12957284ea -- yoh@lena:/tmp/openneurolab-metasearch-dataset [here]

web: https://s3.amazonaws.com/fcp-indi/data/P...
ok
whereis abide_initiative/sub-50286/ses-1/T1_rep-0.mgz (3 copies)
00000000-0000-0000-0000-000000000001 -- web
9ed025be-5276-4e8a-a1fc-d82a04514147 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/labs/openneurolab/metasearch
d9a685f3-b192-412c-b798-1a12957284ea -- yoh@lena:/tmp/openneurolab-metasearch-dataset [here]
web: https://s3.amazonaws.com/fcp-indi/data/P...
ok
```

Threaded View

TitleAuthorDate
Christian Haselgrove Dec 19, 2019
RE: Duplicate data in ABIDE
Yaroslav Halchenko Dec 19, 2019