Fast way to read numerous files with TChain and create a lib after that?

Hello

In my analysis, I have several 100 files that I need to run on. So far, I am hadding files, and then I run on one huge file, which of course can be a bit inconvenient. In any case, while merging files, and sometimes, even with the -k flag I guess crashes and errors like [*] that are very difficult to debug and understand if my file is corrupted or not. Even using multithreaded jobs, sometimes the job loses connection to the node etc

So, I also investigated the possibility of doing a TChain for all .root files, but this is again extremely slow to loop over all of my nAOD needed files.

So, here is my question. Is there a way, to create some kind of TChain and then store it as a dictionary or some kind of precompiled lib, so that every time I have to run it, I won’t have to read the .root files from scratch? The files have a typical structure of a TTree and a couple of TH1 objects.

I hope my questions make sense,
Thanks
Alex

[*] hadd Target path: ./parts//partial6_7d570eaa-7fcb-11ee-bee1-96bde183beef.root:/ Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-30095) ; trying to recover by setting it to zero Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-1526730564) ; trying to recover by setting it to zero Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-1471649464) ; trying to recover by setting it to zero Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-685220469) ; trying to recover by setting it to zero hadd Opening the next 1 files hadd Target path: ./parts//partial5_7d570eaa-7fcb-11ee-bee1-96bde183beef.root:/ Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-1536104923) ; trying to recover by setting it to zero Error in <TBasket::Streamer>: The value of fIOBits (00000000000000000000000001111100) contains unknown flags (supported flags are 00000000000000000000000000000001), indicating this was written with a newer version of ROOT utilizing critical IO features this version of ROOT does not support. Refusing to deserialize. hadd Opening the next 1 files

Dear @Alkass ,

In principle, the fastest (and safest) way to process files of a TChain is through RDataFrame. See an example of how to run on a chain in this section in particular. With RDataFrame you also get the ability to exploit all cores of your machine for free.

I am not really sure I understand the second part of the question about storing a chain of files as a precompiled lib. Anytime you re-run the analysis, you will need to re-open the files in any case.

Cheers,
Vincenzo

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.