RDataframe vectors of differnt size error

Hi, I’m getting this error message below.

line 479, in <module>  hNQvT.Add(i.GetValue())
cppyy.gbl.std.runtime_error: const TH2D& ROOT::RDF::RResultPtr<TH2D>::GetValue() =>
    runtime_error: Cannot call operator / on vectors of different sizes.

and below is part of my code

df = rt.RDataFrame("Events", [argv[1],argv[2],argv[3],argv[4],argv[5]])
#df=df.Range(0,100)
HList=[]     #holds all histogram
hb_NormQ = defaultdict(dict)  #Linearized charge, normalized by the amplitude of SOI (hb_fc3)

s_QSum = 'hb_fc0+hb_fc1+hb_fc2+hb_fc3+hb_fc4+hb_fc5+hb_fc6+hb_fc7'
df = df.Define('QSum',s_QSum) # sum of charges in from all time samples, for HB

#Bunch of G matrices....

#generate signal amplitudes
matrices={-10:Gm10 ,-9:Gm9, -8:Gm8, -7:Gm7, -6:Gm6, -5:Gm5, -4:Gm4, -3:Gm3, -2:Gm2, -1:Gm1, 0:G0, 1:G1, 2:G2, 3:G3, 4:G4, 5:G5, 6:G6, 7:G7, 8:G8, 9:G9, 10:G10, 11:G11, 12:G12, 13:G13, 14:G14, 15:G15, 16:G16, 17:G17, 18:G18, 19:G19, 20:G20, 21:G21}

sig_amp = defaultdict(dict)

def MatMul(M,i):
        return("({})*hb_fc0+({})*hb_fc1+({})*hb_fc2+({})*hb_fc3+({})*hb_fc4+({})*hb_fc5+({})*hb_fc6+({})*hb_fc7".format(M[i][0],M[i][1],M[i][2],M[i][3],M[i][4],M[i][5],M[i][6],M[i][7]))

sig_amp = defaultdict(dict)

for tshift in range(-10,21):
    for tslice in range(8):
        sig_amp[tshift][tslice]=MatMul(matrices[tshift],tslice)

#make RDataFrame with amplitides
for tshift in range(-10,21):
    for tslice in range(8):
        varName = str(tshift).replace("-","m")
        df = df.Define(f"tshift_{tslice}_{varName}",f"tshift=={tshift}").Define(f"sig_amp_{tslice}_{varName}",sig_amp[tshift][tslice])

#make GoodPulse cut

for tshift in range(-10,21):
    varName = str(tshift).replace("-","m")
    df=df.Define(f"GoodPulse_{varName}",f"0.1*(sig_amp_3_{varName})>sig_amp_2_{varName} && 0.1*(sig_amp_3_{varName})>sig_amp_4_{varName} && sig_amp_0_{varName}+sig_amp_1_{varName}+sig_amp_5_{varName}+sig_amp_6_{varName}+sig_amp_7_{varName}<10000")

#make Norm_Q
for tshift in range(-10,21):
    varName = str(tshift).replace("-","m") 
    for tslice in range(8):
        df=df.Define(f"cut_hb_fc{tslice}_{varName}",f"hb_fc{tslice}[GoodPulse_{varName}]")
        df = df.Define(f"RealTime{tslice}_{varName}",f"0*cut_hb_fc{tslice}_{varName}+25*{tslice}-(tshift)")
        df = df.Define(f"hb_NormQ_{tslice}_{varName}", f"cut_hb_fc{tslice}_{varName}/sig_amp_3_{varName}")
    df = df.Define(f"EWeight{varName}",f"cut_hb_fc0_{varName}+cut_hb_fc1_{varName}+cut_hb_fc2_{varName}+cut_hb_fc3_{varName}+cut_hb_fc4_{varName}+cut_hb_fc5_{varName}+cut_hb_fc6_{varName}+cut_hb_fc7_{varName}")
    #df=df.Define(f"QSum_ampcut_{varName}",f"QSum[GoodPulse_{varName}]")


#make plots
htemp1=[]
htemp2=[]

for tshift in range(-10,21):
    for tslice in range(8):
        varName = str(tshift).replace("-","m")
        htemp1.append(df.Histo2D((f"hNQvT_{tslice}{varName}","",200,0,200,200,0,2.0),f"RealTime{tslice}{varName}",f"hb_NormQ_{tslice}_{varName}"))
        htemp2.append(df.Histo2D((f"hNQvT_EnW_{tslice}{varName}","",200,0,200,200,0,2.0),f"RealTime{tslice}{varName}",f"hb_NormQ_{tslice}_{varName}",f"EWeight{varName}"))


hNQvT = rt.TH2D("hNQvT","Phase aligned normal pulses;Time [ns];Normalized charge [fC]",200,0,200,200,0,2)
hNQvT_EnW = rt.TH2D("hNQvT_EnW","Phase aligned normal pulses | Energy weighted;Time [ns];Normalized charge [fC]",200,0,200,200,0,2.0)

for i in htemp1:
    hNQvT.Add(i.GetValue())

for i in htemp2:
    hNQvT_EnW.Add(i.GetValue())

HList.append(hNQvT)
HList.append(hNQvT_EnW)
HList.append(df.Histo1D(("hTShift",";tshift [ns]",30,-10.5,19.5),'tshift'))
HList.append(df.Histo1D(("hQSum","Charge summed of all time samples;charge [fC]",100,0,2e6),"QSum"))

tf_out = rt.TFile(argv[6],'RECREATE')
for hh in HList:
    hh.Write()
tf_out.Close()

I believe this is because the histogram that I’m appending to the list htemp1 has different numbers of x and y values, but I don’t think it should have different x and y values. Am I missing something? The reason I think there should be same number of x and y value is because I applied the cut [GoodPulse] to make cut_hb_fc and used that to make x value: RealTime and y value: hb_NormQ


_ROOT Version: 6.26/11
_Platform: Ubuntu
_Compiler: python3


Hello,

Thanks for posting, and welcome to the forum!
The code you posted is sophisticated (it does many things, as it should) and a bit hard to read. We do not have indications of bugs in the detection of input collections of different sizes so far.

My suggestion would be to add somewhere a debug node to check what is going on, perhaps a filter like "cout << rdfentry_; return true;" to check at what entry the problem occurs and then to check the content of that row with the Display method.

Let us know how it goes.

Cheers,
Danilo

It says rdfentry_ is not defined. I didn’t put return True because it said it needs to be inside a function.

Apologies for that.
Could you try

Define("x", "rdfentry_").Filter("cout << x << endl; return true;")

That should work for 6.26 and given it’s debugging it might be considered ok-ish.

Cheers,
Danilo

I did

df=df.Define("x", "rdfentry_").Filter("print(x), return True")

since this is python3 but I might have wrote down something nonsense (Sorry I am not familar with interchanging c++ and python) because it gave me errors

Hi,
I might be off but since the error is

Cannot call operator / on vectors of different sizes.

and the only occurrence of the / operator in the code is in this expression:

df.Define(f"hb_NormQ_{tslice}_{varName}",
          f"cut_hb_fc{tslice}_{varName}/sig_amp_3_{varName}")

the problem is likely that variables cut_hb_fc{tslice}_{varName} and sig_amp_3_{varName} are arrays of different sizes (at least for one combination of tslice and varName). You can verify that with a printout like Danilo suggests (using C++'s cout rather than Python’s print in the string expression).

I hope this helps!
Enrico

1 Like

Hi, my error was indeed from this part. I fixed it by adding [GoodPulse] cut on sig_amp_3_{varName}. Thank you so much!

2 Likes