TChain: wildcard notation and TNetFile

Hello,

It is possible to add several TFile objects in the TChain using the wildcarding notation like this:

TChain ch("T")
ch.Add("/path/to/files/*.root");

The question is: how do I add TNetFile objects by the same way?
The manual says that it is possible to use wildcards but it does not say if it is possible to use them with TNetFiles.

This code works as expected:

TChain ch("T")
ch.Add("root://remotehost/path/to/files/data.root");

but why the following line does not work?

TChain ch("T")
ch.Add("root://remotehost/path/to/files/*.root");

my ROOT version is 5.15/05

Hi Konstantin,

The TChain method Add does not support wildcards. Since several people asked for similar features we have just commited in CVS head a new static method for TFileInfo which allows to create a list of TFileInfo objects; the list can then be passed as input to TChain::AddFileInfoList . In your case it would work like this

TChain ch("T");
ch.AddFileInfoList(TFileInfo::CreateListMatching("root://remotehost/path/to/files/*.root"));

The alternative is to loop over the directory entries

TChain ch("T");
dir = gSystem->OpenDirectory("root://remotehost/path/to/files/");
char *ent;
while ((ent = gSystem->GetDirEntry(dir))) {
    TString fn = Form("root://remotehost/path/to/files/%s", ent);
    if (fn.EndsWith(".root")) {
       FileStat_t st;
       if (!gSystem->GetPathInfo(fn, st) && R_ISREG(st.fMode)) 
          ch.Add(fn);
    }
}
gSystem->FreeDirectory(dir);

For what relates adding TNetFile, TChain contains only meta information about files and creates the real TFile (TNetFile, …) objects internally only when needed.
In which case would you need that?

G. Ganis

Hi again,

Small rectification: to avoid memory leaks, the first solution should read

TChain ch("T");
TList *filelist = TFileInfo::CreateListMatching("root://remotehost/path/to/files/*.root");
if (filelist) {
   ch.AddFileInfoList(filelist);
   delete filelist;
}

We will shortly add a TChain method that does this automatically.

G. Ganis

Hi Ganis,

I tried your following example:
dir = gSystem->OpenDirectory("root://remotehost/path/to/files/"); 

but it root returns with the error message:

I also tried " TFileInfo::CreateListMatching":

TList *filelist = TFileInfo::CreateListMatching("root://acas0420.usatlas.bnl.gov//data/cache/HPTV/*.root");

Any idea?

–Shuwei

Dear Shuwei,

From the error message that you get I can guess that your URL corresponds to a XROOTD redirector. Unfortunately, for time being, the these listing operations are only supported for XROOTD data servers; support for redirectors maybe added in the future (I’ll check with the author). We will add a warning about this.

Back to your problem, you should try to get the information about the files which are supposed to be on the XROOTD cluster from another place (e.g. a catalog); you can then form a list of TFileInfo objects, with the main URL in TFileInfo in the form

root://acas0420.usatlas.bnl.gov//data/cache/HPTV/

(assuming that the redirector is acas0420.usatlas.bnl.gov); finally you can feed the list into a chain using TChain::AddFileInfoList; of course you can save the URLs into a text file and use TFileInfo::CreateList(“text_file”) to create the list of TFileInfo object.

G. Ganis

Dear Ganis,

Thanks for your quick reply.

Do you mean that I should log in to that XROOTD redirector and find out the list of files? But I am not allowed to log in that machine. Can I send any shell command to that machine via gProof?

–Shuwei

Dear Shuwei,

Yes, if you have PROOF workers on each of the XROOTD data servers, you can try

gProof->Exec(".! ls /data/cache/HPTV")

I am, however, in contact with Andy Hanushevsky to see how we can improve the “listing” functionality via the redirector. You are not the first one to ask.

G. Ganis

Is this still the recommended way to handle wildcarding for xrootd files, or has there been an addition to TChain since this posting?

Thanks,
Heather

Hi Heather,

It has been decided to not touch TChain; a new class TFileCollection describing set of files has been introduced which incorporates the methods to parse directories or text files with list of files to be included.

In your specific case you should do the following:

TChain ch("T");
TFileCollection fc("dum"); // The name is irrelevant
fc.AddFromDirectory("root://remotehost/path/to/files/*.root");
ch.AddFileInfoList(fc.GetList());

If your files are listed in a file, e.g. ‘myfilelist.txt’ such that

$ cat myfilelist.txt
root://remotehost/path/to/files/file1.root
root://remotehost/path/to/files/file2.root
...

then you can do

TChain ch("T");
TFileCollection fc("dum","","myfilelist.txt");
ch.AddFileInfoList(fc.GetList());

G. Ganis

Unfortunately, TFileCollection will not work for us, as we use an xrootd redirector as well. Has there been any improvements for this case? Our group has been using input files contains the names of the files to be chained, as a workaround, but it would be nice to find a cleaner solution.

Thanks,
Heather

Dear Heather,

The answer to your question is that there has not been any change in ROOT in this respect.
The class TFileCollection has all the methods to acquire a list of files from a directory, even with wildcards (method TFileCollection::AddFrom Directory),
but the “problem” here is on the XROOTD side.

By construction XROOTD does not contain metadata information about the available files, so listing functionality is only limited to the data server nodes.
The idea behind is that you know from some other places which files you want (a catalog or a database) and then you ask the XROOTD system for those files.
The net result is that TFileCollection::AddFromDirectory via a redirector will not work.

To somewhat overcome this problem, recently support for FUSE-based fs-like mounting has been added to XROOTD: in this case the XROOTD cluster is seen as
a mounted disk on your machine, and directory listing should work. XrootdFS (that’s the name) is still somewhat experimental but there are several people trying
it out, so you may want to give a look and a try. See wt2.slac.stanford.edu/xrootdfs/).

G. Ganis

Hi,

I see that here an old discussion about wildcarding had something to do with TXNetFile and an xrootd cluster. Some people gets annoyed by the fact that it’s difficult to get the expected results when dealing with primitives which rely on some form of access to the list of files, i.e. metadata. Wildcarding can be considered one of these.

From the pure xrootd perspective, to have access to ls-like primitives one has to do a much more complicated setup, and pass (in a local cluster) through FUSE or implement something specific. In that case he will have a server or a cluster whose only purpose is to give filesystem-like metadata, without affecting the data part. I agree, a mess for most small installations, but there are good reasons for this. Some descriptions and hints for these setups are visible in the latest xrootd-related presentations.

This apparently crude lack has to do with the emphasis that the xrootd system puts into the performance and scalability issues. The general idea is that as a repository grows and grows, these issues can become absolute showstoppers for a storage system. That’s the generic reason why it’s preferable to deal in other ways with primitives which are not scalable with respect to a pure storage system perspective. So, it’s preferable having one feature less from the beginning, from being sure that nothing will work at some point.

For a pure xrootd cluster, having global ls-like primitives, in the recent times, has become even more controversial, since you can build meta-clusters where the sub-clusters can be in different continents, and still contact a redirector which may be replicated in N different places (thus giving location-aware load balancing). With such deployments, having a global ls-like functionality can be even more considered as a very questionable feature.

This is not the correct place to deal with this, but the typical data analysis frameworks have some form of metadata catalog, which typically is accessed in an offline fashion. Doing this way, one can build a list of the interesting files and get access to them without worrying too much.

Fabrizio