RDFs Using multiple threads without `EnableImplicitMT()`

Hi @eguiraud @vpadulan,

I’m back again with an issue regarding how RDFs are optimising processing and their resource usage.

You might recall from this thread that I have a skimming code which uses RDFs to filter and clean up large n-tuples using RDFs.

In this code, I simply loop over some trees in the file then use DefineSlot to add some columns, then filter() to apply some cuts. The main loop of the code is simply:

        for (unsigned int i=0;i<skimmedTreenames.size();i++) {

            TChain *tree = new TChain(TStringIt(skimmedTreenames[i]));
            int chained = tree->Add(TStringIt(inpaths[f]).Data());
            if(!tree->GetEntries()){delete tree; continue;}

            log(LOG_INFO) << "Sklimming the following tree: " << skimmedTreenames[i] << '\n';

            //info is different in particleLevel tree
            bool isparticleLevel = false;
            if (skimmedTreenames[i] == "particleLevel") isparticleLevel = true;

            /*
            multi-threaded event loops. More details for a MT safe cluster submission here:
            https://root.cern/doc/master/classROOT_1_1RDataFrame.html#parallel-execution
            HTCondor CPUs request here: https://batchdocs.web.cern.ch/local/submit.html#resources-and-limits
            */
            // ROOT::EnableImplicitMT();
            ROOT::RDataFrame df(*tree);
            ROOT::RDF::RNode d = df;

            ROOT::RDF::RSnapshotOptions opts;
            opts.fMode = "UPDATE";

            // Add dummy variable slots
            d = AddDummySlots(d, isMC, isSyst);
            // Add the MC reconstructed objects (jets, leptons,...)
            d = AddObjectSlots(d, isMC);
            // Filter the dataframe to pre-selection
            d = d.Filter(std::get<1>(presel), std::get<2>(presel), std::get<0>(presel).c_str());
            // Add the variables that should be kept for any NTuple (Data, MC, Nominal & Syst)
            d = AddVariablesForAll(d);
            // Add nominal-only or syst-only variables
            d = isSyst ? AddVariablesForSyst(d, CaliMaps) : AddVariablesForNominal(d);
            // Add MC only or data only variables
            d = isMC   ? AddVariablesForMC(d): AddVariablesForData(d);
            /* Add MC Metadata ... */
            d = isMC ? AddMCMetadata(d, sw_totals): d;
            // Defining progress bar
            d.Count().OnPartialResult(/*every */100/* events*/, [&log](auto c) { log(LOG_INFO) << c << " events processed\n"; });
            log(LOG_INFO) << "Snapshotting the file... "  << '\n';
            CleanUpAndSave(d, skimmedTreenames[i], outpaths[f], opts);

            delete tree;
        }


        log(LOG_INFO) << " outfile -->>>  : " << outpaths[f] << '\n';

        // End timer and print time elapsed
        log.time_since_last_snap();
        log(LOG_INFO) << '\n';
    }

    log(LOG_INFO) << "DONE :)" << '\n';

Where the only JIT code I have is in Snapshot() used in CleanUpAndSave() function.

Due to the large sizes of the N-tuples we are processing, I am trying to run this on the GRID, which is where we spotted issues with the resource usage of the code.

Although the code is not hoarding much memory when executed locally, it seems like a large number of threads (5-15) are running every time I try to run the executable. I am not so sure why this happens, but seeing as on the GRID I ask for 1-core, this leads to realy overwhelming the CPU (with cpu efficiency reaching 1800% for a 400MB ROOT file).

I am trying to understand how I can avoid this, and if there is something I am doing wrong in the way I am setting up this code that I can improve to make sure the code is more stable. Do you have any suggestions?

This is with ROOT v6.28/00.

Hi @MoAly98,

IIUC, you stated that ROOT is spawning multiple threads even if ROOT::EnableImplicitMT() was not called? Can you confirm that no other part in your program is calling the aforementioned function? If you are not sure, you could maybe try to call ROOT::DisableImplicitMT() right before any computation is done (perhaps, even before the loop you showed above).

Also, CC’ing @vpadulan to see if he can shed some more light.

Cheers,
J.

Hi @jalopezg

Yeah that’s what I seem to see.

I see multiple threads running my executable. The screenshot here is after compiling with DisableImplicitMT();. I am pretty sure there is no other parts od the code explicitly enabling MT. I also tried with a fresh build directory.

Hi @MoAly98 ,

you don’t even need DisableImplicitMT(), if you don’t call EnableImplicitMT() RDF spawns no threads.

If you read data via xrootd (that is, if the file names start with root://) then xrootd spawns a number of helper threads, but they should be mostly idle. Indeed the htop screenshot above shows all threads but the last one to be idling – no CPU usage. You can check what those threads are doing by running the program under gdb, stopping it in the middle of execution with ctrl-C and then running info threads or thread apply all bt 5.

So I can’t tell where the 1800% CPU efficiency comes from. That should only be possible with EnableImplicitMT(). Are you sure that when submitting to the grid the version of the program without the EnableImplicitMT() call is picked up?

Sorry I cannot be more helpful,
Enrico

Hi @eguiraud,
Sorry for the late reply, I was offline. I am also very confused by the behaviour. I made a fresh build directory, tried cloning my repository from scratch and compiling, but all attemps seem to run on the grid with ~125-130 threads and using ~1000% cpu efficiency, causing most jobs to be killed after a really long time.

The screenshot I showed was running over a local file (EOS) – I see sometimes 10-15 threads running, but locally I can’t seem to reproduce the insane number of threads running on the GRID.

I will try to valgrind the program, and I will also make an attempt to turn on MT but limiting number of threads to 5, to see if the grid responds.

Thanks a lot,
Mo

Indeed calling EnableImplicitMT(5) should not spawn more than 5 RDF threads (+ the xrootd threads I mentioned that should mostly be idle).

Hi,

Just to close this, it turns out the multi-threading was coming from the default behaviour of XGBoost, which we used in one of our branches in the RDFs code. So this is not concerned with ROOT.

Sorry about the noise :slight_smile:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.