I am working on a simple example to understand how to define a new column with mathematical operation such as sqrt, abs, and log. In particular, I found out that the log operation return nan value both in C and python.
To illuminate the issue, the C version is shown below:
I was thinking this could be it. However, I did a comparison on the histogram on log(abs(Electron_dxy)) between interactive root session and Histo1D(‘logd’). The former return a valid distribution; the latter is an empty canvas…
How do you produce the first histogram? With TTree::Draw? It’s possible that TTree::Draw discards invalid histogram values on the fly (@pcanal might be able to confirm).
Does d.Filter("Electron_dxy > 0").Define("logd" , "log(Electron_dxy)").Mean("logd") produce the expected results?
TTree::Draw converts nans to zeros on the fly, RDataFrame does not – one one hand TTree::Draw’s behavior is nicer, on the other it might silently hide issues in your calculations
ROOT histograms that have been filled with nans cannot be drawn, as you experienced
I guess Electron_dxy is an array of values, so what we are returning is an array of true/false values.
In that case, you can use RVec’s masking feature instead to discard the non-positive elements before taking the logarithm:
auto zMean = d.Define("logd" , "log(Electron_dxy[Electron_dxy > 0])").Mean("logd");