It is possible to add several TFile objects in the TChain using the wildcarding notation like this:
TChain ch("T")
ch.Add("/path/to/files/*.root");
The question is: how do I add TNetFile objects by the same way?
The manual says that it is possible to use wildcards but it does not say if it is possible to use them with TNetFiles.
The TChain method Add does not support wildcards. Since several people asked for similar features we have just commited in CVS head a new static method for TFileInfo which allows to create a list of TFileInfo objects; the list can then be passed as input to TChain::AddFileInfoList . In your case it would work like this
The alternative is to loop over the directory entries
TChain ch("T");
dir = gSystem->OpenDirectory("root://remotehost/path/to/files/");
char *ent;
while ((ent = gSystem->GetDirEntry(dir))) {
TString fn = Form("root://remotehost/path/to/files/%s", ent);
if (fn.EndsWith(".root")) {
FileStat_t st;
if (!gSystem->GetPathInfo(fn, st) && R_ISREG(st.fMode))
ch.Add(fn);
}
}
gSystem->FreeDirectory(dir);
For what relates adding TNetFile, TChain contains only meta information about files and creates the real TFile (TNetFile, …) objects internally only when needed.
In which case would you need that?
From the error message that you get I can guess that your URL corresponds to a XROOTD redirector. Unfortunately, for time being, the these listing operations are only supported for XROOTD data servers; support for redirectors maybe added in the future (I’ll check with the author). We will add a warning about this.
Back to your problem, you should try to get the information about the files which are supposed to be on the XROOTD cluster from another place (e.g. a catalog); you can then form a list of TFileInfo objects, with the main URL in TFileInfo in the form
(assuming that the redirector is acas0420.usatlas.bnl.gov); finally you can feed the list into a chain using TChain::AddFileInfoList; of course you can save the URLs into a text file and use TFileInfo::CreateList(“text_file”) to create the list of TFileInfo object.
Thanks for your quick reply.
Do you mean that I should log in to that XROOTD redirector and find out the list of files? But I am not allowed to log in that machine. Can I send any shell command to that machine via gProof?
Yes, if you have PROOF workers on each of the XROOTD data servers, you can try
gProof->Exec(".! ls /data/cache/HPTV")
I am, however, in contact with Andy Hanushevsky to see how we can improve the “listing” functionality via the redirector. You are not the first one to ask.
It has been decided to not touch TChain; a new class TFileCollection describing set of files has been introduced which incorporates the methods to parse directories or text files with list of files to be included.
In your specific case you should do the following:
TChain ch("T");
TFileCollection fc("dum"); // The name is irrelevant
fc.AddFromDirectory("root://remotehost/path/to/files/*.root");
ch.AddFileInfoList(fc.GetList());
If your files are listed in a file, e.g. ‘myfilelist.txt’ such that
Unfortunately, TFileCollection will not work for us, as we use an xrootd redirector as well. Has there been any improvements for this case? Our group has been using input files contains the names of the files to be chained, as a workaround, but it would be nice to find a cleaner solution.
The answer to your question is that there has not been any change in ROOT in this respect.
The class TFileCollection has all the methods to acquire a list of files from a directory, even with wildcards (method TFileCollection::AddFrom Directory),
but the “problem” here is on the XROOTD side.
By construction XROOTD does not contain metadata information about the available files, so listing functionality is only limited to the data server nodes.
The idea behind is that you know from some other places which files you want (a catalog or a database) and then you ask the XROOTD system for those files.
The net result is that TFileCollection::AddFromDirectory via a redirector will not work.
To somewhat overcome this problem, recently support for FUSE-based fs-like mounting has been added to XROOTD: in this case the XROOTD cluster is seen as
a mounted disk on your machine, and directory listing should work. XrootdFS (that’s the name) is still somewhat experimental but there are several people trying
it out, so you may want to give a look and a try. See wt2.slac.stanford.edu/xrootdfs/).
I see that here an old discussion about wildcarding had something to do with TXNetFile and an xrootd cluster. Some people gets annoyed by the fact that it’s difficult to get the expected results when dealing with primitives which rely on some form of access to the list of files, i.e. metadata. Wildcarding can be considered one of these.
From the pure xrootd perspective, to have access to ls-like primitives one has to do a much more complicated setup, and pass (in a local cluster) through FUSE or implement something specific. In that case he will have a server or a cluster whose only purpose is to give filesystem-like metadata, without affecting the data part. I agree, a mess for most small installations, but there are good reasons for this. Some descriptions and hints for these setups are visible in the latest xrootd-related presentations.
This apparently crude lack has to do with the emphasis that the xrootd system puts into the performance and scalability issues. The general idea is that as a repository grows and grows, these issues can become absolute showstoppers for a storage system. That’s the generic reason why it’s preferable to deal in other ways with primitives which are not scalable with respect to a pure storage system perspective. So, it’s preferable having one feature less from the beginning, from being sure that nothing will work at some point.
For a pure xrootd cluster, having global ls-like primitives, in the recent times, has become even more controversial, since you can build meta-clusters where the sub-clusters can be in different continents, and still contact a redirector which may be replicated in N different places (thus giving location-aware load balancing). With such deployments, having a global ls-like functionality can be even more considered as a very questionable feature.
This is not the correct place to deal with this, but the typical data analysis frameworks have some form of metadata catalog, which typically is accessed in an offline fashion. Doing this way, one can build a list of the interesting files and get access to them without worrying too much.