Array filtering in RDataFrame returning indices


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.26
Platform: Not Provided
Compiler: Not Provided


So, I have a dataframe of a tree with a branch ‘ACR’ holding C array. I want array of indices of ACR in each event, which pass the certain cut.
E.g. given the cut ACR < 5, I want indices of all the elements of ACR which have value less than 5. Or a masking array (array of true/false) might also work.

Did you try to search the forum? Search results for 'array filtering rdataframe' - ROOT Forum
And if you don’t find something relevant, I guess @eguiraud can give you some hints

Hi,

that would be a Define("mask", "ACR < 5"), see also this section of the docs.

Cheers,
Enrico

1 Like

I did search forum. But maybe with wrong phrases. So couldn’t find
relevant results. sorry.

What will be type of “mask” in above example.
I want to apply some functions on indices returned in mask for each entry.
So I will be using RDataFrame.ForEach(), with ForEach calling a function
only on the elements where mask is true. So, the ‘F’ object of ForEach
has to be passed data type of mask. I tried this thing on ROOT terminal
and got the following result, which is surprising. I expected type to be
RVec<Bool_t>.

root [0] auto rdf = ROOT::RDataFrame("EventParameters","/home/chinmay/DataComparison2022/09Nov2022/Z05_35_G21k_2.6_6.75_8.0/P014_Z05_35_G21k_2p6_6p75_8p0.root")
(ROOT::RDataFrame &) A data frame built on top of the EventParameters dataset.
root [1] auto filterCol = rdf.Define("filterCol","HillasParsHighGain.NumSaturatedCells < 3")
(ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> &) @0x7f28522270d0
root [2] filterCol.GetColumnType("filterCol")
(std::string) "bool"
root [3] 

HillasParsHighGain.NumSaturatedCells is a C-array of integers with variable length in above case.
(It is a member of a class, dictionary for which is generated using rootcling)

For technical reasons it’s going to be an RVec<int> rather than an RVec<bool>, but it will have the contents you expect (0 for filtered out elements, 1 for accepted elements) and you can use the mask to copy the selected elements out of RVecs with the same size as the mask, e.g. ACR[ACR < 5] returns a new RVec with just the elements of ACR that are smaller than 5.

Indeed that’s surprising. The expected behavior is something like this:

root [0] ROOT::RDataFrame(10)\
  .Define("vec", "ROOT::RVecD{1,2,3}")\
  .Define("mask", "vec > 2")\
  .GetColumnType("mask")
(std::string) "ROOT::VecOps::RVec<int>"

What does GetColumnType("HillasParsHighGain.NumSaturatedCells") return?

Cheers,
Enrico

Okay. So surprisingly GetColumnType("HillasParsHighGain.NumSaturatedCells") returns (std::string) "UInt_t". However the definition of the class is as follows

class HillasParameters : public TObject
{
	public:
		/* ====================  LIFECYCLE     ======================================= */
		HillasParameters () ;                              /* constructor */

		void Init(Int_t pixels) ;

		virtual ~HillasParameters () ;

		void Reset() ; 

		HillasParameters& operator = (const HillasParameters& hpars) ;

		/* ====================  ACCESSORS     ======================================= */

		/* ====================  MUTATORS      ======================================= */
		Bool_t finite ;                          ///< If parameter values are finite 
		Int_t pixels ;                          ///< Number of pixels 
//		Int_t profiles ;                        ///< Number of profiles 
	    Int_t N_Pix ;                           ///< Number of pixels in image 
		Int_t Hit_Pix ;                         ///< Number of pixels in image 
		Float_t Size ;                          ///< size of the event 
		Float_t MeanX ;                         ///< <x> 
		Float_t MeanY ;                         ///< <y> 
		Float_t MeanXY ;                        ///< <xy>  
		Float_t MeanX2 ;                        ///< <xsquare> 
		Float_t MeanY2 ;                        ///< <ysquare> 
		Float_t Dist ;                          ///< distance of the centroid of the image from center 
		Float_t Frac2 ;                         ///< distance of the centroid of the image from center 
		Float_t Length ;                        ///< Length of the image 
		Float_t Width ;                         ///< Width of the image 
		Float_t Azwidth ;                       ///< Width of the image 
		Float_t Asym ;                          ///< Asymmetry of the image 
		Float_t Miss ;                          ///< Miss of the image 
		Float_t Slope ;                         ///< Slope of the image axis 
		Float_t Angle_Xaxis ;                   ///< Angle of image axis w.r.t. camera X-axis
		Float_t Alpha ;                         ///< Alpha of the image 
		Float_t Leakage1 ;                      ///< leakage1 : Fraction of size in outermost ring of camera 
		Float_t Leakage2 ;                      ///< leakage2 : Fraction of size in outermost 2 rings of camera 

		///< array of contents in each pixel in cleaned image
		Float_t *Cleaned_Image ;            //[pixels]
		UInt_t *NumSaturatedCells ;        //[pixels]
		Bool_t *HitPattern ;                 //[pixels]

		///< array of contents in each pixel in cleaned image
//		Float_t *Profile_MeanX ;            //[profiles]

		///< array of contents in each pixel in cleaned image
//		Float_t *Profile_MeanY ;            //[profiles]

		///endcond{CLASSIMP}
		ClassDef(HillasParameters,1) ;
		///endcond

}; /* -----  end of class HillasParameters  ----- */

HillasParsHighGain is the main branch which is of the type ‘HillasParameters’. Why NumSaturatedCells is shown to be UInt_t ?

I’m not sure. What does tree->Print() say? If it disagrees with RDF this is a bug and it would be great if you could share the input file so we can debug and fix the problem.

The relevant part of tree → Print() is,

*Branch  :HillasParsHighGain.                                                *
*Entries :   107849 : BranchElement (see below)                              *
*............................................................................*
*Br   44 :HillasParsHighGain.TObject.fUniqueID : UInt_t                      *
*Entries :   107849 : Total  Size=     437189 bytes  File Size  =       7733 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=  56.38     *
*............................................................................*
*Br   45 :HillasParsHighGain.TObject.fBits : UInt_t                          *
*Entries :   107849 : Total  Size=     871351 bytes  File Size  =     143707 *
*Baskets :       59 : Basket Size=      32000 bytes  Compression=   6.05     *
*............................................................................*
*Br   46 :HillasParsHighGain.finite : Bool_t                                 *
*Entries :   107849 : Total  Size=     113158 bytes  File Size  =      25121 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   4.46     *
*............................................................................*
*Br   47 :HillasParsHighGain.pixels : Int_t                                  *
*Entries :   107849 : Total  Size=     436705 bytes  File Size  =       7741 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=  56.26     *
*............................................................................*
*Br   48 :HillasParsHighGain.N_Pix : Int_t                                   *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     133902 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   3.25     *
*............................................................................*
*Br   49 :HillasParsHighGain.Hit_Pix : Int_t                                 *
*Entries :   107849 : Total  Size=     436749 bytes  File Size  =       7333 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=  59.40     *
*............................................................................*
*Br   50 :HillasParsHighGain.Size : Float_t                                  *
*Entries :   107849 : Total  Size=     436617 bytes  File Size  =     333194 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.31     *
*............................................................................*
*Br   51 :HillasParsHighGain.MeanX : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     372690 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.17     *
*............................................................................*
*Br   52 :HillasParsHighGain.MeanY : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     373648 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.17     *
*............................................................................*
*Br   53 :HillasParsHighGain.MeanXY : Float_t                                *
*Entries :   107849 : Total  Size=     436705 bytes  File Size  =     380550 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.14     *
*............................................................................*
*Br   54 :HillasParsHighGain.MeanX2 : Float_t                                *
*Entries :   107849 : Total  Size=     436705 bytes  File Size  =     366826 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.19     *
*............................................................................*
*Br   55 :HillasParsHighGain.MeanY2 : Float_t                                *
*Entries :   107849 : Total  Size=     436705 bytes  File Size  =     370882 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.17     *
*............................................................................*
*Br   56 :HillasParsHighGain.Dist : Float_t                                  *
*Entries :   107849 : Total  Size=     436617 bytes  File Size  =     356705 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.22     *
*............................................................................*
*Br   57 :HillasParsHighGain.Frac2 : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     354730 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.23     *
*............................................................................*
*Br   58 :HillasParsHighGain.Length : Float_t                                *
*Entries :   107849 : Total  Size=     436705 bytes  File Size  =     361536 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.20     *
*............................................................................*
*Br   59 :HillasParsHighGain.Width : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     353328 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.23     *
*............................................................................*
*Br   60 :HillasParsHighGain.Azwidth : Float_t                               *
*Entries :   107849 : Total  Size=     436749 bytes  File Size  =     362766 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.20     *
*............................................................................*
*Br   61 :HillasParsHighGain.Asym : Float_t                                  *
*Entries :   107849 : Total  Size=     436617 bytes  File Size  =     366081 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.19     *
*............................................................................*
*Br   62 :HillasParsHighGain.Miss : Float_t                                  *
*Entries :   107849 : Total  Size=     436617 bytes  File Size  =     370397 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.18     *
*............................................................................*
*Br   63 :HillasParsHighGain.Slope : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     375755 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.16     *
*............................................................................*
*Br   64 :HillasParsHighGain.Angle_Xaxis : Float_t                           *
*Entries :   107849 : Total  Size=     436925 bytes  File Size  =     369492 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.18     *
*............................................................................*
*Br   65 :HillasParsHighGain.Alpha : Float_t                                 *
*Entries :   107849 : Total  Size=     436661 bytes  File Size  =     360416 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   1.21     *
*............................................................................*
*Br   66 :HillasParsHighGain.Leakage1 : Float_t                              *
*Entries :   107849 : Total  Size=     436793 bytes  File Size  =      86098 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   5.06     *
*............................................................................*
*Br   67 :HillasParsHighGain.Leakage2 : Float_t                              *
*Entries :   107849 : Total  Size=     436793 bytes  File Size  =     114499 *
*Baskets :       40 : Basket Size=      32000 bytes  Compression=   3.80     *
*............................................................................*
*Br   68 :HillasParsHighGain.Cleaned_Image :                                 *
*         | Float_t HillasParsHighGain.Cleaned_Image[pixels]                 *
*Entries :   107849 : Total  Size=  472027156 bytes  File Size  =   13002031 *
*Baskets :    15424 : Basket Size=      32000 bytes  Compression=  36.28     *
*............................................................................*
*Br   69 :HillasParsHighGain.NumSaturatedCells :                             *
*         | UInt_t HillasParsHighGain.NumSaturatedCells[pixels]              *
*Entries :   107849 : Total  Size=  472088868 bytes  File Size  =    5363422 *
*Baskets :    15424 : Basket Size=      32000 bytes  Compression=  87.96     *
*............................................................................*
*Br   70 :HillasParsHighGain.HitPattern :                                    *
*         | Bool_t HillasParsHighGain.HitPattern[pixels]                     *
*Entries :   107849 : Total  Size=  118384126 bytes  File Size  =    4191881 *
*Baskets :     3738 : Basket Size=      32000 bytes  Compression=  28.22     *

The RDataFrame.Describe for same tree gives,

Column                                  Type                            Origin
------                                  ----                            ------
HillasParsHighGain.                     HillasParameters                Dataset
HillasParsHighGain.Alpha                Float_t                         Dataset
HillasParsHighGain.Angle_Xaxis          Float_t                         Dataset
HillasParsHighGain.Asym                 Float_t                         Dataset
HillasParsHighGain.Azwidth              Float_t                         Dataset
HillasParsHighGain.Cleaned_Image        Float_t                         Dataset
HillasParsHighGain.Dist                 Float_t                         Dataset
HillasParsHighGain.Frac2                Float_t                         Dataset
HillasParsHighGain.HitPattern           Bool_t                          Dataset
HillasParsHighGain.Hit_Pix              Int_t                           Dataset
HillasParsHighGain.Leakage1             Float_t                         Dataset
HillasParsHighGain.Leakage2             Float_t                         Dataset
HillasParsHighGain.Length               Float_t                         Dataset
HillasParsHighGain.MeanX                Float_t                         Dataset
HillasParsHighGain.MeanX2               Float_t                         Dataset
HillasParsHighGain.MeanXY               Float_t                         Dataset
HillasParsHighGain.MeanY                Float_t                         Dataset
HillasParsHighGain.MeanY2               Float_t                         Dataset
HillasParsHighGain.Miss                 Float_t                         Dataset
HillasParsHighGain.N_Pix                Int_t                           Dataset
HillasParsHighGain.NumSaturatedCells    UInt_t                          Dataset
HillasParsHighGain.Size                 Float_t                         Dataset
HillasParsHighGain.Slope                Float_t                         Dataset
HillasParsHighGain.TObject              HillasParameters                Dataset
HillasParsHighGain.TObject.fBits        UInt_t                          Dataset
HillasParsHighGain.TObject.fUniqueID    UInt_t                          Dataset
HillasParsHighGain.Width                Float_t                         Dataset
HillasParsHighGain.finite               Bool_t                          Dataset
HillasParsHighGain.pixels               Int_t                           Dataset
HillasParsLowGain.                      HillasParameters                Dataset
HillasParsLowGain.Alpha                 Float_t                         Dataset
HillasParsLowGain.Angle_Xaxis           Float_t                         Dataset
HillasParsLowGain.Asym                  Float_t                         Dataset
HillasParsLowGain.Azwidth               Float_t                         Dataset
HillasParsLowGain.Cleaned_Image         Float_t                         Dataset
HillasParsLowGain.Dist                  Float_t                         Dataset
HillasParsLowGain.Frac2                 Float_t                         Dataset
HillasParsLowGain.HitPattern            Bool_t                          Dataset
HillasParsLowGain.Hit_Pix               Int_t                           Dataset
HillasParsLowGain.Leakage1              Float_t                         Dataset
HillasParsLowGain.Leakage2              Float_t                         Dataset
HillasParsLowGain.Length                Float_t                         Dataset
HillasParsLowGain.MeanX                 Float_t                         Dataset
HillasParsLowGain.MeanX2                Float_t                         Dataset
HillasParsLowGain.MeanXY                Float_t                         Dataset
HillasParsLowGain.MeanY                 Float_t                         Dataset
HillasParsLowGain.MeanY2                Float_t                         Dataset
HillasParsLowGain.Miss                  Float_t                         Dataset
HillasParsLowGain.N_Pix                 Int_t                           Dataset
HillasParsLowGain.NumSaturatedCells     UInt_t                          Dataset
HillasParsLowGain.Size                  Float_t                         Dataset
HillasParsLowGain.Slope                 Float_t                         Dataset
HillasParsLowGain.TObject               HillasParameters                Dataset
HillasParsLowGain.TObject.fBits         UInt_t                          Dataset
HillasParsLowGain.TObject.fUniqueID     UInt_t                          Dataset
HillasParsLowGain.Width                 Float_t                         Dataset
HillasParsLowGain.finite                Bool_t                          Dataset
HillasParsLowGain.pixels                Int_t                           Dataset

Input file is little large and producing reproducer would also take some time. Let me see if I can do it.

It looks like for this TTree schema RDF gets the column type wrong. That’s terrible, sorry about that :confused:

I have never seen this before so we will definitely need an input file, even with just a few events, in order to debug and fix the problem.

It seems that RDF gets type of variable length C-type array members of classes wrong. Recently, I changed my I/O code so as to replace variable C-type array branches in TTree by std::vectors branches.
RDF gets the data type wrong in case of C-array branch, while it gets it right in case of branch holding std::vector.

Hi @Chinmay ,

yes that’s it, but in general RDF also works with C-style arrays – at least for all our test cases.

Can you please share an example broken input file that I can use to debug what’s going on in this case?

Cheers,
Enrico