The number of entries in the new test.root file is different from that in the original 3.root file. This only happens in the version 6.18.00 (not in 6.16.00) when the Implicit MT is turned on, and not for all the files. Actually this file dependence seems to be different between my linux desktop and mac laptop.
I’m sorry that I can’t provide a reproducer here. That file is still in use and I’m not sure if I can make it public.
Hi,
thank you for the report! I’m not aware of any race condition in Snapshot that might cause this, nor changes between 6.16 and 6.18 that might cause a regression of such importance.
It would be very, very useful if you could share reproducer+input file privately with me so I can debug what’s going on.
The reproducer is pretty much what’s in the original post. But could you please try both of the files separately? It might be machine/platform dependent. And please let me know if you can’t reproduce the problem.
All numbers printed match. I tested ROOT master (debug build) as well as v6.18 on lxplus (sourced with . /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.18.00/x86_64-centos7-gcc48-opt/bin/thisroot.sh).
Any idea? Without a repro, there is no way to debug/fix this.
I didn’t try this on lxplus before. I just did it and as you said, it can’t be reproduced on lxplus.
However, I ran root with time and I noticed that the (user + sys) / real time ratio is about 15% (<100%) even with IMT on. Locally on my desktop, it’s >100%. I don’t know if this is because of the hard drive or eos performance. But I suspect this ratio matters if it’s a data race problem.
Although this can be a little annoying, may I suggest you run it locally (preferably on a machine with SSD)? Thank you!
Hi,
It wasn’t clear from my message, but the test on ROOT master was performed on my workstation, using 8 cores and reading from SSD. There shouldn’t be any major differences in behavior between v6.18 and master, bit it would be great if you could check that you also see this behavior with ROOT master.
Indeed on lxplus most of the time is spent waiting for EOS.
input file 1: 612031
input file 2: 1956041
total: 2568072
read: 2557591
written on file: 2557591
(And even when I comment out the Snapshot and only leave the Count there the “read” number is still the same. Sorry that I didn’t realize this before.)
I think I have this problem also with a version I compiled on Mar 20 from heads/master@v6-16-00-rc1-1339-g2ebe61ce28.
Can this be related to the IO optimization?
This problem is also on an old desktop (results might be slightly different though). I think the cpu is I5-5257U so 2 cores 4 threads. It’s SSD. Thanks!
Hi,
so with some digging it turned out this was a critical bug in TTreeProcessorMT, here is the relevant jira ticket I opened.
This PR should fix. It will be included in the next v6.18 patch release.
Thank you very much for reporting
I remain at your disposal for any further questions. The warnings are not linked to this bug, and in fact I did not see any warnings when running on the data you shared.
The fix was merged into v6.18-00-patches and master.
You can test it by building ROOT or, from tomorrow, using the nightly builds of ROOT available on cvmfs, e.g. source /cvmfs/sft.cern.ch/lcg/views/dev3/Fri/x86_64-centos7-gcc8-opt/setup.sh (from tomorrow, the current Friday nightly is the one from last week).
You can also try right now on Linux with the prefix stack I use to test ROOT (not an official service). It has ROOT built nightly and deployed to CVMFS. Works on any Linux distribution with kernel 2.6.32 and above:
$ /cvmfs/sft.cern.ch/lcg/contrib/gentoo/startprefix
Entering Gentoo Prefix /cvmfs/sft.cern.ch/lcg/contrib/gentoo/linux/x86_64
$ root --version
ROOT Version: 6.19/01
Built for linuxx8664gcc on Jul 11 2019, 00:01:00
From git-r3/HEAD@v6-19-01-427-g0d60f6de6b