TDSet, TChain etc

Hi experts,
I would like to produce a chain with several thousand files, and pass it to a selector. But when you add files to a chain you do so by name, which hides bad files. You must either open the file manually to make sure it is not a zombie, or add non-existent or corrupted files to the chain.

Perhaps a better way to do this is to pass a TDSet, and adding only those files to it that are “staged” and “non-corrupted”. In the documentation this is the method below:
Bool_t TDSet::Add (TCollection *fileinfo, const char *meta=0, Bool_t availableOnly=kFALSE, TCollection *badlist=0)

Question 1) What does it mean and how do I “stage” the TDSet?

Question 2) Is there a way (like dataDir.GetListOfFiles()) to get a collection of TFiles instead of TSystemFiles? Or interpreting the TSystemFiles as TFiles? (see my error below)

Question 3) Generally, how do I make use of the “availableOnly” flag to get only good files in my TDSet?

A code snippet from my program is:

  TChain *inTree=new TChain("mytree");
  inTree->Add(fileName);
  
  TString dataPath;
  dataPath="/scratch/mgv4ce/rootfiles/";
  TSystemDirectory dataDir(dataPath, dataPath);
  TList *flist = dataDir.GetListOfFiles();
  TDSet *data= new TDSet();
  data->Add(flist,0,kTRUE,0);
  
  mySelector *s=new mySelector();
  Long64_t nentries=1000000000;
  Option_t *option="";
  proof->Process(data,s,option,nentries,0); //this says the data is empty
  //inTree->Process(s,option,nentries,0);  //this works just fine

The error I get is

Warning in <TDSet::Add>: found object fo unexpected type TSystemFile - ignoring
Warning in <TDSet::Add>: found object fo unexpected type TSystemFile - ignoring
Warning in <TDSet::Add>: found object fo unexpected type TSystemFile - ignoring
...etc
Error in <TDataSetManagerFile::ParseUri>: DataSet name is empty
Error in <TProofLite::Process>: from AssertDataSet: no dataset(s) found on the master corresponding to: 

Many thanks for the help!

Dear mgv4ce

For large datasets in PROOF please see : root.cern.ch/working-data-sets .
Alternatively you can use TChain::Add with a directory path as argument to load all the files in the directory, but without checks on their existence or quality.

G Ganis

Dear Ganis,
Thanks for the reply.

I looked at the tutorial already. It does not answer my basic question:
Q: How do I use the method below?
Bool_t TDSet::Add (TCollection *fileinfo, const char *meta=0, Bool_t availableOnly=kFALSE, TCollection *badlist=0)

I presently am doing:

   TString protocol = "file://";
   TString inDIR="/directory/path/";
   TSystemDirectory inDir(inDIR, inDIR);
   TFileCollection* fc = new TFileCollection("files", "files");
   fc->Add(protocol + inDIR + "*.root");
   fc->SetDefaultTreeName("usertree");
   TDSet *data= new TDSet();
   data->Add(fc,"",kTRUE);

This results in a compiling error:

main.cpp: In function ‘int main(int, char**)’:
main.cpp:89: error: no matching function for call to ‘TDSet::Add(TFileCollection*&, const char [1], const Bool_t&)’
/nv/blue/bkw1a/apps/root-5.34.28/include/TDSet.h:193: note: candidates are: virtual Bool_t TDSet::Add(const char*, const char*, const char*, Long64_t, Long64_t, const char*)
/nv/blue/bkw1a/apps/root-5.34.28/include/TDSet.h:196: note:                 virtual Bool_t TDSet::Add(TDSet*)
/nv/blue/bkw1a/apps/root-5.34.28/include/TDSet.h:197: note:                 virtual Bool_t TDSet::Add(TCollection*, const char*, Bool_t, TCollection*)
/nv/blue/bkw1a/apps/root-5.34.28/include/TDSet.h:199: note:                 virtual Bool_t TDSet::Add(TFileInfo*, const char*)

I try to simply passing the TFileCollection cast as a TCollection, but this causes segfaults.

How can I get the underlying TFileInfo from the TFileCollection so that I can pass the list?

Thanks again!
Mike

Dear Mike,

TFileCollection has a getter for the list of files:

data->Add(fc->GetList(),"",kTRUE);

G Ganis