TH1F::Clone returns TGraphErrors

Hello, it’s very difficult to contextualize this problem, by the way this code snippet is inside a loop and it is executed hundreds of times and randomly, maybe depending on the inputs:

        theclone = self.histo.Clone()
        try:
            return self.filler(self.iclass(theclone), data, self.binning)
        except TypeError:
            logging.error("self.iclass: %s\nself.histo: %s\ntype(self.histo): %s\ntype(self.histo.Clone()): %s",
                          self.iclass, self.histo, type(self.histo), type(theclone))
            raise

and sometimes I get:


ERROR:root:self.iclass: <class 'ROOT.TH1F'>
self.histo: <ROOT.TH1F object ("h") at 0x2365bf00>
type(self.histo): <class 'ROOT.TH1F'>
type(self.histo.Clone()): <class 'ROOT.TGraphErrors'>
Traceback (most recent call last):
  File "performance.py", line 405, in <module>
    options.histo_binning, options.histo_line, options.reference)
  File "performance.py", line 146, in __init__
    worker = PerformanceWorker(binnedData, bins, estimators, theBinning, label, comb, histo_binning)
  File "/cnfs/homes_fs/home/turra/MVACalib/egammaMVACalibUtils/trunk/python/worker.py", line 53, in __init__
    binVarName, histoBinning)
  File "/cnfs/homes_fs/home/turra/MVACalib/egammaMVACalibUtils/trunk/python/worker.py", line 63, in run
    histograms = np.vectorize(self.histoGetter(histoBinning))(binnedData) # TODO: pass options to c-tor    
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1862, in __call__
    theout = self.thefunc(*newargs)
  File "/cnfs/homes_fs/home/turra/MVACalib/egammaMVACalibUtils/trunk/python/RootTools.py", line 114, in __call__
    return self.filler(self.iclass(theclone), data, self.binning)
TypeError: none of the 6 overloaded methods succeeded. Full details:
  TH1F::TH1F() =>
    takes at most 0 arguments (1 given)
  TH1F::TH1F(const TVectorF& v) =>
    could not convert argument 1
  TH1F::TH1F(const TH1F& h1f) =>
    could not convert argument 1
  TH1F::TH1F(const char* name, const char* title, Int_t nbinsx, Double_t xlow, Double_t xup) =>
    takes at least 5 arguments (1 given)
  TH1F::TH1F(const char* name, const char* title, Int_t nbinsx, const Float_t* xbins) =>
    takes at least 4 arguments (1 given)
  TH1F::TH1F(const char* name, const char* title, Int_t nbinsx, const Double_t* xbins) =>
    takes at least 4 arguments (1 given)

as you can see the type(self.histo.Clone()) is a TGraphErrors, but the type(self.histo) is a TH1F! It doesn’t make sense to me.

Hi,

what does this:gROOT.MustClean()give you?

Likewise for both the histo and the clone, what does this yield:self.histo.TestBit(TObject.kMustCleanup) theclone.TestBit(TObject.kMustCleanup)

What I’m thinking of, is that the memory regulator has a stale address to a TGraphErrors (previously used/registered) and now runs across a TH1F that has been allocated at that address. PyObject’s are recycled to maintain object identity. PyROOT does not check the type anymore, as it deals with TObjects only at that level. This works, b/c TObjects in their dtor notify any listeners that the object is being deleted. If any of the above code statements yield “False” however, then no notification is sent.

Cheers,
Wim

[quote=“wlav”]Hi,

what does this:gROOT.MustClean()give you?

Likewise for both the histo and the clone, what does this yield:self.histo.TestBit(TObject.kMustCleanup) theclone.TestBit(TObject.kMustCleanup)

[/quote]

thank you. It was quite difficult for me to reproduce the problem. I had to rerun the same program a lot of time, with the same input and option to trigger the error.

gROOT.MustClean() -> True
self.histo.TestBit(ROOT.TObject.kMustCleanup) -> True
theclone.TestBit(ROOT.TObject.kMustCleanup) -> False

additional infos: the problematic function is used as argument of with numpy.vectorize

Hi,

okay, so that’s one part of the puzzle. The other question is where the TGraphErrors object is coming from? I doubt that it, too, is the result of Clone(), since Clone() results in an object owned by Python. Thus, if that object was deleted on the C++ side, it should result in a crash at some later point.

Here’s my thinking: the memory regulator works both ways. On the python side, the fact that a TGraphErrors is found, means that there must be a live python object with a (now dangling) pointer to the deleted TGraphErrors. The solution, then, is to ‘del’ that reference on the python side. Any idea where such a TGraphErrors may be created? I’d look for it on the current function frame or in the global space.

(I’m not familiar numpy.vectorize; from the docs I understand that it is not executed in parallel, which would change matters since the memory regulator is a not thread-safe from C++ (it relies on the GIL).)

Cheers,
Wim

[quote=“wlav”]Hi,

okay, so that’s one part of the puzzle. The other question is where the TGraphErrors object is coming from? I doubt that it, too, is the result of Clone(), since Clone() results in an object owned by Python. Thus, if that object was deleted on the C++ side, it should result in a crash at some later point.

Here’s my thinking: the memory regulator works both ways. On the python side, the fact that a TGraphErrors is found, means that there must be a live python object with a (now dangling) pointer to the deleted TGraphErrors. The solution, then, is to ‘del’ that reference on the python side. Any idea where such a TGraphErrors may be created? I’d look for it on the current function frame or in the global space.
[/code]

In my code I’m creating, drawing and saving to a file (the same file where TH1F are saved) a lot of TGraphErrors, but they have no relation with any TH1F, in fact I’m creating TH1F and TGraphErrors in different functions, called at different time.

By the way the program never crashes from the c++ or from the python side, apart from the problem we’re are talking about.

(I'm not familiar numpy.vectorize; from the docs I understand that it is _not_ executed in parallel, which would change matters since the memory regulator is a not thread-safe from C++ (it relies on the GIL).)

yes, you’re right. I was thinking about some multithreading problem because of the random nature of this problem: if I execute two times the same code with the same input someting I get the error, sometimes I don’t.

Cheers,
Wim[/quote]

Hi,

That’s puzzling: if there are only local references (i.e. on the function frame) pointing to the objects, they should get their refcounts to zero after the frame gets cleaned up.

Looking at numpy.vectorize, I see that most code is in python, yet they create a ufunc out of the python function. The ufunc appears to allocate its own frame, so perhaps the frame does not clean up (or at least not immediately on exit).

Can you del the local reference to TGraphErrors before the function returns, in the function which creates it?

I’ve added an additional check to the RetrieveObject() of the memory regulator to check that both the found object proxy and the given TObject instance have the same TClass(), before recycling them. Is in v5-34-00-patches. This should solve these kind of cases and if the “wrong” object gets recycled, it should matter little (there’s still the issue that the recycled object may not have the same ownership as the intended, so deleting local, dangling, references would still be preferred).

Cheers,
Wim

Hello, sorry, to answer to your previous answer is too coplicated, software evolved and now the problem is not still there.

I have a new very similar problem:

for thebin, it in itertools.izip(bins_iterator, transposed_iterator):
   list_graphs = list(it)

   canvas_top = canvas = ROOT.TCanvas()
   if compare_formula:
         canvas.Divide(1,2)
         canvas_top = canvas.cd(1)
                
    ydiv, xdiv = int(math.sqrt(len(list_graphs))), int(math.ceil(float(len(list_graphs)) / int(math.sqrt(len(list_graphs)))))
    canvas_top.Divide(xdiv, ydiv)

this works, but if I move:

list_graphs = list(it)

after the canvas division sometimes it crashes with:
AttributeError: ‘TGraphErrors’ object has no attribute ‘Divide’

and the type(canvas_top) is TGraphErrors

I’m not sure if it is this change that triggers the error, even if it is the only change I’ve made. If I rerun the code sometimes I’ve the error, sometimes not.

Hi,

is this with the head of the v5-34-00-patches branch (i.e. where I’ve added the type check)?

What I don’t understand is that both the TGraphErrors and the divided TCanvas should be live here (on the C++ and Python sides), so I don’t see how their pointers can be equal w/o memory corruption.

Any chance that you can run valgrind on this code? Thanks.

Cheers,
Wim

[quote=“wlav”]Hi,

is this with the head of the v5-34-00-patches branch (i.e. where I’ve added the type check)?

What I don’t understand is that both the TGraphErrors and the divided TCanvas should be live here (on the C++ and Python sides), so I don’t see how their pointers can be equal w/o memory corruption.

Any chance that you can run valgrind on this code? Thanks.

Cheers,
Wim[/quote]

I’m using 5.34/00, how can I use your branch on lxplus?

I can run with valgrind, which tool do you need?

Hi,

nightlies are available in the lcg app area on /afs:

/afs/cern.ch/sw/lcg/app/nightlies/dev/Tue/ROOT/ROOT_5_34_00-patches

(with ‘Tue’ cycling through the days of the week). Soon enough it’ll be in one of the v5-34 lettered releases.

For valgrind, simple memcheck should do. Make sure to use the suppression file that comes with python, though.

Thanks,
Wim