Feature request: a special exception to gracefully interrupt RDataFrame loops

malfonsi79 · April 4, 2019, 10:55pm

Dear ROOT developers,

I am wondering if it would be possible to implement a specific exception that is able to interrupt gracefully a RDataFrame loop. With gracefully I mean e.g. able to properly close the output file produced by Snapshot() with the events created so far.

I find very useful the option to create empty dataframes and obtain a new tree via the Define method, it is something that can simplify the creation of new trees, but the limitation is that I need to know the total number of events in advance.

I have not deeply investigated the RDataSource facility, but at a first glance it seems having the same limitation.

This is why I think that an interruption mechanism based on exceptions (I cannot figure out any other) that can be thrown from user code could be beneficial.

Best,
Matteo

Danilo · April 5, 2019, 7:12am

Hi Matteo,

if there is nothing dedicated to that yet, on the other hand you can implement that behaviour with filters. For example

bool skip = false;
RDataFrame df(absoluteMaxEntries);
df.Filter([&skip](){return !skip;}).[rest of the chain].Filter([](){ Check some condition, flip skip if needed}).Snapshot(....);

Would something like that work for you? Unless some synchronisation is used, if this technique is used in MT mode you will have a few more entries than the ones you originally wanted.

Cheers,
D

Axel · April 5, 2019, 7:30am

Maybe use an atomic_bool fCancel in the entry loops? for (entries...) if (fCancel) break; else run RDFNodes?

eguiraud · April 5, 2019, 7:47am

Hi,
This is https://sft.its.cern.ch/jira/browse/ROOT-9372.

The idea is to have an Until method that acts like a Range but on a user-specified condition.

Not sure about multi-threading support though, that will be tricky.

Cheers,
Enrico

malfonsi79 · April 12, 2019, 9:10am

Dear all,

thanks for the suggestions, which seems to do the work, and nice to learn that you are already planning a solution for future versions.

My idea of using exceptions came from the fact that sometimes the condition to interrupt gracefully the main loop could come from some special situation occurring in the one of the end-leaves of the evaluation graph… but I admit that I should come with a more concrete case to demonstrate that the solution from Danilo is not viable.

About MT, at the moment I do not find it really beneficial for my lab-size projects. Maybe with the structure of the LHC data and the available computing infrastructures for LHC you can see a big boost, but so far for all my cases, even with moderatly intensive calculations, the bottle-neck is just the data reading from disk and I stopped trying it.

eguiraud · April 12, 2019, 9:44am

Interesting…RDF multi-threading parallelizes ROOT I/O too (e.g. different clusters of entries are decompressed in parallel), so it should be beneficial. You should see no speed-up only if one thread already saturates the disk I/O bandwidth, which is usually not the case.
Is this with data on SSD, spinning disk or read via network?

Cheers,
Enrico

malfonsi79 · April 12, 2019, 10:59am

Standard PC spinning disk, I assumed that the disk bandwidth was saturated.
But do not think from my statement that I did any special benchmarks more than time-ing the command that launches the script in the two cases (no MT and 6 threads MT). Probably I can collect more accurate information in some other occasion - I always wanted to write a congratulation/feedback thread on the RDataFrame feature, but as usual “spare time” is the real issue - or we go out-of-topic and we waste real time on hypotheses.

Thanks again,
Matteo

eguiraud · April 12, 2019, 12:08pm

Standard PC spinning disk, I assumed that the disk bandwidth was saturated.

That would also be my guess. We typically test on SSD’s, and commonly see almost-linear speed-ups on machines with 4 or 8 cores.

always wanted to write a congratulation/feedback thread on the RDataFrame feature

congratulations are welcome, but feedback is precious!

system · April 26, 2019, 12:08pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.