Dear Root Developers,
I recently ran into a ROOT usage problem which may interest you, the
solution of which may suggest a minor ROOT code change or some changes
to the documentation. This issue is important because at least in
this case valid data was not processed as expected and with no warning
messages of any kind. First, let me describe the scenario.
In use is a standard tool developed locally, a ROOT macro, which
creates a TChain of input ntuple datasets, applies a TCut and outputs
the result via CopyTree(), an operation we refer to as “pruning”. We
use a web page front-end to assemble data for the job, and then submit
a batch job to run the ROOT macro. The user depends on the return
code to indicate success or failure. Typically, a user may not
even look at the output log beyond the return code.
The case of interest included 792 separate files, for a grand total
of 1,885,273,770 events. The user had specified a TCut which selected
a contiguous group of ~21k events in the 540th file in the TChain
(although he did not know this beforehand). When the job ended, the
return code was zero, and no events had been delivered to the output
stream.
Inside the ROOT macro, there are two relevant functions to
accomplish this task.
chain->Add(filename); // Build the TChain of files
tree->CopyTree(selection); // Create output file based on TCut criteria
What was not at all obvious from the documentation (ROOT Reference
Manual) is that in both cases arbitrary limits were imposed by
default.
In the case of the chain->Add(filename), the documentation discusses
the second “nentries” parameter but implies that the default is safe
to use. I do not know a priori how many events are in the
requested TChain so never thought about using case “B”. And case “A”,
while ultimately appropriate for this situation, is made out to be an
inefficient mode when the files are to be read sequentially (which
they are).
[ref root.cern.ch/root/html402/TChain.html#TChain:Add ]
So I changed the code to the following and that seems to work,
chain->Add(filename,0);
Now, about the documentation and code, may I suggest a few changes:
-
First, include a link to the value of kBigNumber rather than to
its type declaration? This would help the user quickly find its
value. -
Next, in today’s world of high energy (and astro) particle physics,
the current value of kBigNumber, 1,234,567,890, is hardly a “big
number” any more. Add to that the fact that the “nentries” parameter
is a Long64_t and kBigNumber becomes laughably small! How about
modernizing this value? Or perhaps case “C” is no longer a good
default? -
Is it possible and reasonable for the user to specify an
arbitrarily large value for “nentries” without incurring significant
performance penalty? For example, if I were to routinely specify
nentries=999,999,999,999,999,999 would this hinder performance? Would
doing so be more desirable than specifying nentries=0? -
Most importantly, when chain->Add() decides to stop accummulating
events (having reached nentries), could it please emit a warning
message to the user just in case that was not the intended action?
This was, in my opinion, the most dangerous part of this situation:
ROOT quietly threw events on the floor and, at least in some (many?)
cases, the user would be none the wiser.
Unrelated question: Add() seems a superset of AddFile(), is there any
advantage to using AddFile() over Add(), when possible to do so?
===
In the case of tree->CopyTree(), the documentation is completely
remiss in discussing the 2nd, 3rd and 4th parameters. (This is not
the only function with this symptom.) Tracking down the meaning of
these parameters - in the code - eventually led me to these changes:
Long64_t nentries = chain->GetEntries();
tree->CopyTree(selection,"",nentries);
because the “invisible” default value for nentries is 1,000,000,000.,
and its meaning was a surprise: “nentries” refers to the maximum
number of events read from the input stream, not the maximum written
to the output stream!
[ref root.cern.ch/root/html402/TTree. … e:CopyTree ]
-
Could all parameters for every function please be properly
documented, along with default values? -
Again, the most dangerous part of this situation is that ROOT lets
data quietly fall on the floor. When CopyTree() runs into its
"nentries" limit, could it please emit a warning to the user? -
Does ROOT have a “verbose mode” by which long-running, monolithic
functions (e.g. CopyTree()) can be monitored? For example, to emit a
heartbeat INFO message every 1M events could be useful for
diagnostics.
===
For the record, most of my testing was done with ROOT v4.02 but a
limited amount was done with v5.10 to verify similar behavior. The
machine type was Linux RHEL 2.4.21-47 (2xCPU 2GB).
Thank you for your consideration,
- Tom Glanzman
SLAC