Skip validateion of datasets

I want to use PROOF to process a number of files in dcache. I’ve got a text file with the dcap:/…/ URLs for the files.

To process these files, currently I create a TDSet with the following code

  TFileCollection fc("myFileList","A File List");
  TDSet* ds = new TDSet("myDSet");
  ds->Add( fc.GetList() );

Now, each time I call ds->Draw("…") or ds->Process(…), all files are validated by PROOF, which takes quite a time.

Is there a way to skip the validation process, e.g. if I call two Draw commands after each other I don’t see why the whole dataset needs to be validated again?

Additionally, is there a smarter way to “convert” the file list into a dataset that PROOF can process?
I tried to use TProof::RegisterDataset but this did not work for me. TProof::UploadDataset does not seem to be what I want either, because I want to use the files directly from dCache, instead of copying them to the PROOF nodes first.

Thanks in advance,

If the datesets will remain around for a short while, then I think you can declare the data-set to PROOF, and it will remember things between sessions. TProof::RegisterDataSet is the method. After that, you can just grab the DS from PROOF and use it directly (or, in many cases, pass it as a string).

If these are transient DS, then I guess this is a bit more difficult.


PROOF has no memory of the previous {query, run}; there is no way to know that the dataset of a run is exactly the same of the one of the previous run. Unless you tell PROOF explicitly, using the dataset technology, which had also the purpose to do do the validation step only once.

We are working to a packetizer that uses the file information more dynamically, in particular not requiring to know in advance the number of entries.
In the meantime, if you do not use datasets, you can try the following trick. Since validation happens only if the number of entries for a file is not known, if you specify those numbers when creating the chain using the second argument of the constructor

chain->AddFile(my_file, its_entries)

validation should be skipped.

Let me know.

G. Ganis