Hello,
I was excited to read that RDataFrames constructed with TChains that have a TEntryList loaded should respect that entry list (although the old forum threads about this imply there may be some caveats here, which I will need to study in more detail). But what I wanted to check is if there’s a way to create the TEntryList in a multithreaded run?
The situation is I have data spread over many files, and am currently processing it with RDataFrame from a TChain. However, I frequently want to run again on the data but just plot something different and not change the event selection. What would be ideal would be when I do my first run over all the data I can build a TEntryList, save it somehow (to a file - its a lot of data so I would have imagined I’d need to utilise ROOT: TEntryListFromFile Class Reference for this as I worry the full TEntryList may be too big for memory) and then in subsequent runs I could process a chain with this entrylist.
But am I correct in thinking I cannot build a TEntryList in a multithreaded run because the entry number isn’t available reliably? I saw there is DefineSlotEntry method but assumed the entry number there is a thread-local entry number rather than the input data entry number (why isn’t it possible to get that global entry number in the thread?). So am I correct to think I’d have to do a single-thread processing to build the TEntryList with a custom Action (is there an example available of this?) and then I can process in parallel?
Just to add - since the data is spread over lots of files, and I was thinking I would need to use TEntryListFromFile, and hence have a separate TEntryList for each file of the chain, if there was a way to process multiple files with multithread but each thread takes care of a single file (no sharing file between threads) then I could build my entrylists that way, but I dont think that mode of event looping is supported is it?
Happy to hear otherwise and that what I want to do is possible after all?
Thanks!
Will