Hello experts,
I would like to evaluate a BDT output within the Distributed RDF. I have searched for relevant information but haven’t found any examples. In my case, the BDT model from TMVA is stored in an XML file.
Below is a toy example — this example works with a (normal) RDF but crashes when used with the Distributed RDF.
I would kindly like to ask for your advice on how to fix it.
Best,
Jindrich
import ROOT
from distributed import Client
from dask_jobqueue import SLURMCluster
import distributed
def create_remote_connection():
python = "singularity exec /cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmssw/el9:x86_64 python3"
cluster = SLURMCluster(
job_name="test",
cores=1,
memory='2GB',
python=python
)
cluster.scale(2)
cluster.adapt(minimum=0, maximum=2)
client = Client(cluster, heartbeat_interval='5s', timeout='60s')
print(cluster.job_script())
return client
if __name__ == "__main__":
client = create_remote_connection()
print(client)
files = ['tmva_example.root']
NPARTITIONS = 2
## Distributed RDF
RDataFrame = ROOT.RDF.Experimental.Distributed.Dask.RDataFrame
df = RDataFrame("Events", files, daskclient=client, npartitions=NPARTITIONS)
#df = ROOT.RDataFrame("Events",files)
model = ROOT.TMVA.Experimental.RReader("TMVAClassification_BDT.weights.xml")
variables = model.GetVariableNames()
df = df.Define('BDT_score',ROOT.TMVA.Experimental.Compute[4, "float"](model),list(variables))
h = df.Histo1D(("BDT_score", "BDT Score", 100, -1, 1), "BDT_score")
h1 = df.Histo1D(("var1", "var1", 100, 0, 1), "var1")
h2 = df.Histo1D(("var2", "var2", 100, 0, 1), "var2")
# Save the results
file = ROOT.TFile("test.root", "RECREATE")
h.Write()
h1.Write()
h2.Write()
file.Close()
The error for DistrRDF is as follows:
TBufferFile::WriteObjectAny:0: RuntimeWarning: since TMVA::Experimental::Internal::ComputeHelper<integer_sequence<unsigned long,0,1,2,3>,float,TMVA::Experimental::RReader&> has no public constructor
which can be called without argument, objects of this class
can not be read with the current library. You will need to
add a default constructor before attempting to read it.
TStreamerInfo::Build:0: RuntimeWarning: TMVA::Experimental::Internal::ComputeHelper<integer_sequence<unsigned long,0,1,2,3>,float,TMVA::Experimental::RReader&>: TMVA::Experimental::RReader& has no streamer or dictionary, data member “fFunc” will not be saved
*** Break *** segmentation violation
ROOT Version: From tags/6-34-02@6-34-02
Platform: Built for linuxx8664gcc on Jan 31 2025, 14:36:25
Compiler: g++ (GCC) 13.1.0