Crashes in TTree::SetBranchAddress

disimone · April 28, 2010, 1:55pm

Hi all,

I am trying to migrate my analysis from “plain” root/cint to proof.

I have a macro like this:

{
TProof * p=TProof::Open(“xrootd@grid006.roma2.infn.it:1093”);

TChain* myChain1=new TChain(“myTree”);
myChain1->Add("/some/path/*root");
myChain1->SetProof();
myChain1->Process(“MySelector.C++”,“some parameter”);

TChain* myChain2=new TChain(“myTree”);
myChain2->Add("/some/different/path/*root");
myChain2->SetProof();
myChain2->Process(“MySelector.C++”,“some different parameter”);

// repeat the above several times, each time with a different chain

}

MySelector.C is coming from TTree::MakeSelector (with mods in the Process method of course)

What happens is that the first chain is processed smoothly, but when trying to process the following ones, some workers crash misteriously, with a stack pointing to TTree::SetBranchAddress:

08:41:20 21160 Wrk-0.3 | *** Break : illegal instruction
Generating stack trace…
0x00df39f8 in from /lib/tls/libc.so.6
0x078b184a in TTree::SetBranchAddress(char const, void, TBranch*, TClass*, EDataType, bool) + 0x56 from /opt/root/lib/libTree.so
0x01053fca in Z2mD3PDSelector::Init(TTree*) + 0x145e from /home/xrootd/proof/session-grid006-1272436592-24757/worker-0.3-grid007-1272436596-21160/./Z2mD3PDSelector_C.so

At the second chain, some workers die, at the third some of the remaining ones die and so on and soon I am left with no workers at all…

The crash does not depend on the input files, but really on the fact that another chain was processed previously: the crashing jobs do not crash if they are the first ones to be processed.

I can share the code of the selector class of course, but I was kind of hoping that there is something wrong in my processing multiple chains with the same TProof setup. What is the recommended way to do so?

Thanks for your help,

Andrea.

pcanal · April 30, 2010, 7:08pm

Hi,

Could you provide a complete running example showing this problem?

Thanks,
Philippe.

disimone · May 4, 2010, 2:19pm

Hello Philippe,

actually I can do more: I found out how to solve the problem

For the record, I had lines like these in my selector implementation:

#ifdef MAKECINT
#pragma link C++ class vector+;
#pragma link C++ class vector+;
#pragma link C++ class vector+;
#pragma link C++ class vector+;
#endif

ROOT was actually informing me that the dictionaries for those types already existed, but it was also quite clear that I could safely ignore the message.

It turns out that if I remove those lines altogether, the mysterious crashes disappear.

Just for my education: is this the expected behavior? Why?

Thanks for your help,

Andrea.

pcanal · May 5, 2010, 12:05pm

Hi Andrea,

This is (of course) not the expected behavior. I suspect that the extra dictionary was inducing a difference in the order (and possibly even the set of) dictionary loading and initializing which in turn tickled a defect in the reflection sub-system. I can not really tell what the actual problem is without being able to reproduce it (and knowing which release of ROOT this affects).

Cheers,
Philippe.

nbubis · July 31, 2010, 8:17am

Hi,

Having the same problem with crashes using multiple chains in PROOF. I’m using 5.26 with ProofLite on a dual core laptop. I don’t have any “pragma” lines, but I do have a number of #include lines in my “MySelector.C”. I’m trying to do something like this:

[code]
TProof * proof_instance = TProof::Open("");

for (int i = 0; i < N; i++) {
TList * input_list = new TList();
input_list->Add(…);
SetProofInputList(proof_instance,input_list);
TChain * chain = new TChain(“T”);
chain->Add(…);
chain->SetProof();
chain->Process(“MySelector.C”);
input_list->Delete();
output_list->Delete();
chain->Delete();
}[/code]

I consistently get crashes on the second “chain->Process()” :

#4  <signal handler called>
#5  0x00007fe37a6cdf69 in ?? () from /lib/libc.so.6
#6  0x0000000000f737e0 in ?? ()
#7  0x000000000040b542 in ?? ()
#8  0x00000000015b9dd0 in ?? ()
#9  0x00007fe37b14d9c2 in TProofLite::Process(TDSet*, char const*, char const*, long long, long long) () from /media/data1/School/Thesis/root/lib/libProof.so
#10 0x0000000000409ef5 in main (argc=1, argv=<value optimized out>)
    at ../main.C:130

Any ideas on how to solve this?

nbubis · July 31, 2010, 1:13pm

Ok, I seem to have solved this.

One needs to completely clear the proof instance before adding new chains, i.e.:

proof_instance->ClearInput(); proof_instance->ClearData(); proof_instance->ClearInputData(); chain->Reset();

Hope this helps someone out there.

Nathaniel

ganis · July 31, 2010, 3:37pm

Dear Nathaniel,

You get a crash because you delete the input objects via input_list->Delete(), but the (at this point invalid) pointers to the same objects are still in the PROOF internal input list (the one that you modify with TProof::AddInput()).

Why do you need to have a separate input_list and then call SetProofInputList and not call directly AddInput ?

Anyhow, the input objects not needed or not wanted in the next query must be removed either with proof->ClearInput() - which removes everything - or with proof->GetInputList()->Remove(object) .

G. Ganis

nbubis · August 1, 2010, 8:30pm

ganis,

Thanks for the quick reply.

One more (hopefully last) question for this thread: How do you properly clear the output list?
I’ve been trying:

Is there another way?

ganis · August 2, 2010, 7:53am

Hi

The output list is created ex-novo for each query. You do not need to clear it by yourself.
The current output list (the one that you access via TProof::GetOutputList()) is owned by a dedicated TQueryResult object (see TProof::ShowQueries()), and removing objects from it may have side effects.

Gerri