Evaluating MVA within Distr RDF

Hello experts,

I would like to evaluate a BDT output within the Distributed RDF. I have searched for relevant information but haven’t found any examples. In my case, the BDT model from TMVA is stored in an XML file.

Below is a toy example — this example works with a (normal) RDF but crashes when used with the Distributed RDF.

I would kindly like to ask for your advice on how to fix it.

Best,
Jindrich

import ROOT
from distributed import Client
from dask_jobqueue import SLURMCluster
import distributed

def create_remote_connection():

    python = "singularity exec /cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmssw/el9:x86_64 python3"
    cluster = SLURMCluster(
        job_name="test",
        cores=1,
        memory='2GB',
        python=python
    )

    cluster.scale(2)
    cluster.adapt(minimum=0, maximum=2)
    client = Client(cluster, heartbeat_interval='5s', timeout='60s')

    print(cluster.job_script())

    return client

if __name__ == "__main__":

    client = create_remote_connection()
    print(client)


    files = ['tmva_example.root']
    NPARTITIONS = 2

    ## Distributed RDF
    RDataFrame = ROOT.RDF.Experimental.Distributed.Dask.RDataFrame
    df = RDataFrame("Events", files, daskclient=client, npartitions=NPARTITIONS)
    
    #df = ROOT.RDataFrame("Events",files)

    model = ROOT.TMVA.Experimental.RReader("TMVAClassification_BDT.weights.xml")
    variables = model.GetVariableNames()
    df = df.Define('BDT_score',ROOT.TMVA.Experimental.Compute[4, "float"](model),list(variables))
    h = df.Histo1D(("BDT_score", "BDT Score", 100, -1, 1), "BDT_score")

    h1 = df.Histo1D(("var1", "var1", 100, 0, 1), "var1")
    h2 = df.Histo1D(("var2", "var2", 100, 0, 1), "var2")
    
    # Save the results
    file = ROOT.TFile("test.root", "RECREATE")
    h.Write()
    h1.Write()
    h2.Write()
    file.Close()

The error for DistrRDF is as follows:

TBufferFile::WriteObjectAny:0: RuntimeWarning: since TMVA::Experimental::Internal::ComputeHelper<integer_sequence<unsigned long,0,1,2,3>,float,TMVA::Experimental::RReader&> has no public constructor
which can be called without argument, objects of this class
can not be read with the current library. You will need to
add a default constructor before attempting to read it.
TStreamerInfo::Build:0: RuntimeWarning: TMVA::Experimental::Internal::ComputeHelper<integer_sequence<unsigned long,0,1,2,3>,float,TMVA::Experimental::RReader&>: TMVA::Experimental::RReader& has no streamer or dictionary, data member “fFunc” will not be saved
*** Break *** segmentation violation


ROOT Version: From tags/6-34-02@6-34-02
Platform: Built for linuxx8664gcc on Jan 31 2025, 14:36:25
Compiler: g++ (GCC) 13.1.0


May be @moneta can help you ?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.