Flatten multi-dimensional array in rDataframe

Hello,

I am working on analyzing a root file generated from GEANT4, it consists of branches of 2 or 3 dimensional array , for example:

ntrkhits_pandoraTrack[ntracks_pandoraTrack][3]/S
trkresrg_pandoraTrack[ntracks_pandoraTrack][3][2000]/F

I am attempting to use RDataframe for offline postprocessing , aka flattening those multi-dimensional arrays into 1D branches, for example:

// reading the 2D array as if the 2D array is unrolled into 1D in TTree, n_x+n_y*i
auto df2 = df1.Define( "ntrkhits_Uplane" , "ntrkhits_pandoraTrack[ntracks_pandoraTrack+3*0]" )
.Define( "ntrkhits_Vplane" , "ntrkhits_pandoraTrack[ntracks_pandoraTrack+3*1]" )
.Define( "ntrkhits_WZplane" , "ntrkhits_pandoraTrack[ntracks_pandoraTrack+3*2]" );

However, I have gotten errors:

terminate called after throwing an instance of 'std::runtime_error'
  what():  TTree leaf ntrkhits_pandoraTrack has both a leaf count and a static length. This is not supported.

What is the correct way, or better way to read 2D array in RDataframe?

Thank you very much!
Siewyan

_ROOT Version:6.22/06
_Platform: CentOS 7
_Compiler: g++


@eguiraud can most probably give you some hints

Hi @SiewYan ,
RDataFrame does not support 2D arrays well, because TTreeReader, which RDF uses internally, does not support 2D arrays well. The error you get is basically TTreeReader saying “I don’t know how to read this branch”.

But depending on the exact situation there might be a workaround – could you please share this data (even just a couple of events) with me so I can experiment a bit?

Cheers,
Enrico

Hello @eguiraud , thank you for your answer, and please find the link [*] for your study. It would be great if there is a workaround on reading and manipulating 2D array in RDataframe.

Thanks and looking forward to hear from you.

Cheers,
Siewyan

[*] CERNBox

Hi @SiewYan ,
thank you for the data, I will take a look as soon as possible.

Cheers,
Enrico

1 Like

Hello @eguiraud , may i know how it is going? thanks!

Cheers,
Siewyan

Hi @SiewYan ,
sorry for the high latency, I can reproduce the problem but I could not find a workaround during my first investigation. I’ll give it another go asap.

Cheers,
Enrico

1 Like

Ah, on a better look, I know what this is. It looks like this is an instance of [ROOT-8827] TTreeReaderArray<T> does not work for VLEN branches whose length branch is not of type int - SFTJIRA . The problem is not that the array is 2-dimensional (as we are reading a flattened version anyway) but that its size is of type short.

This is a limitation of TTreeReader (which RDF uses for reading internally). A TTreeReader-only reproducer:

void repro()
{
   TFile f("MUSUN_dunefd_1485_gen_g4filt_detsim_freco_ana.root");
   auto *t = f.Get<TTree>("analysistree/anatree");
   R__ASSERT(t != nullptr);
   TTreeReader r(t);
   TTreeReaderArray<float> idx(r, "trkdedx_pandoraTrack");
   r.Next();
}

results in:

~/S/w/forum_treereader_2dim_arrays root -l repro.C
root [0]
Processing repro.C...
Error in <TTreeReaderValueBase::CreateProxy()>: The branch ntracks_pandoraTrack contains data of type short. It cannot be accessed by a TTreeReaderValue<int>

So the only workaround I can propose would be to regenerate the file with a different size type (int or unsigned int) or to pre-process that file, without RDF, to do the conversion.

I realize that this is frustrating. You mentioned this file is generated from GEANT4, and we should be able to read files generated by GEANT. I’ll check how we can tackle this. FYI @Axel @pcanal .

Cheers,
Enrico

@eguiraud thank you very much on taking the time to troubleshoot. Looks like this is the bottleneck for using RDF on my work; however, changing to int or unsigned int is not so trivial… (I reckon). I will need to refer to my colleague on the root file regeneration with the suggestion though.

On the other hand, it would be great if this is included in the future release.

I am open to other opinion if there is a mini-hack to get through it though.

Looking forward to hear from you all!

Cheers,
Siewyan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi @SiewYan , this is now fixed in the ROOT master branch. Tomorrow’s nightly builds will already contain the fix.

Cheers,
Enrico