Is there a way to process a ds and not compile locally?

gwatts · January 5, 2012, 7:18am

Hi,
I’ve been trying to use TProof::Process(“runner.cpp++”). There is one downside to this - it forces a local build of runner.cpp. Can I avoid this? I definately want it to build on the slave (and I suppose the master)! But there is just no need locally.

The thing is - I have no need for a local build. Further, in one executable I make a bunch of these builds so if I want to keep ROOT clean I need to unload each one (painful)… and the build takes of order a minute.

I was hoping I could send the builds remotely with the TProof::Load, and then just give Process the class name. However, that doesn’t seem to work as it expects it to be a file!

Many thanks in advance!

Cheers, Gordon.

ganis · January 5, 2012, 5:32pm

Hi,

The file ‘runner.cpp’ and the corresponding header define a TSelector class from which a TSelector object is initialized on the workers and on the client. PROOF needs these instances of the processed TSelector, so the answer to your question is, no, you cannot skip the local instantiation of the TSelector class.

I understand that compiling each time takes time, but why do you need to force recompilation with ‘++’? Does it mean that you want to rebuild even if the code did not change?

[quote=“gwatts”]I was hoping I could send the builds remotely with the TProof::Load, and then just give Process the class name. However, that doesn’t seem to work as it expects it to be a file!
[/quote]
You can also specify the selector by class name, but it expects it defined also on the client.

It looks like that you would like a processing mode in which nothing is done on the client except receiving the final output list. This we don’t have now; it may fit into a plan that we have to make the client lighter.

G. Ganis

gwatts · January 5, 2012, 5:47pm

Hi,
Thanks! The “++” was a type-o when I wrote up the post. In my code I use only “+”. Thanks for pointing that out - some of these builds take forever!

Ok, the key here is - the client needs to know the class. I’ve got infrastructure in my code to deal with that - it is just always a bit tricky unloading things correctly in ROOT - you are never totally sure when someone else is holding a reference to an object in the loaded library.

The other thing that I’ve run afoul of is that you always expect a header file. Currently I use MakeProxy to generate a .h file, and then in my .cpp file I define the class and the code to actually run the TTree (if it isn’t clear what I mean, see the attached files). I like this because the .cpp file with the actual query is generated by my code, and thus I have everything in a single file. When running on a TTree locally this setup is no problem. It looks like the PROOF system isn’t flexible enough to handle this setup - I will have to split my TSelector definition into two files. Is that correct?

Thanks again for your help!

-Gordon.
ntuple_CollectionTree.h (346 KB)
queryTestSimpleQuery.cxx (3.16 KB)

ganis · January 6, 2012, 12:33pm

Hi,

Proof does not require to have the selector in two files, but it needs that all files are available when needed. Files with extension .h or .hh and with the same name as the .cpp are automatically uploaded. Other files are not and you have to upload them separately.
To do that youca use Load or, if you have several of them, a PAR file. Proof automatically creates a link to the PAR package top dir and adds ‘-I’ to the include path, so you can include these files directly in your selector.

Gerri

gwatts · January 6, 2012, 5:20pm

Hi,
Ok. So exactly how should I deal with this in my case? Lets say using your PutFile trick (in teh other post) I work around the current bug and so I can use Load to transfer and build all the code locally on my client and on the master and on the workers. Then I create my TSelector object locally. I then want to call TProof::Process, don’t I? I can’t do that directly b/c there doesn’t look like there is a TProof::Process that takes a TSelector* as an argument. So, what would be the proper way to run my dataset on the proof cluster? Do I build a TChain out of my dataset or something like that, and then use it’s Process?

Many thanks for your help!

gwatts · January 9, 2012, 9:44am

I’ll continue this in anther thread since the topic has so changed from the original one.