Reading a lot of files with RFIO is slow

Hi

Using RFIO protocol to read the files (200 ntuples) on a DPM storage element, it takes 10 min to run the analysis. But If I merge the ntuples to big files (~ 2 Gb each) and I put them on the SE, it drops to ~ 50 s.
I open the files like this
result = new TChain(“CollectionTree”)
result->Add(“file1”)
result->Add(“file2”)
result->Add(“file3”)

What could be the origin of such difference ?

Thanks

Karim

Please be more explicit about your setup:
-where are the root files
-where do you run the client
-network speed and bandwidth between the RFIo server and your client
-version of ROOT

Rene

The DPM server is part of the IN2P3-CPPM site. And the client is running on a local computer

-where are the root files
rfio:/dpm/mrs.grid.cnrs.fr/home/atlas/cppm/gwatts/user.GordonWatts.8069.J3_pythi
a_jetjet_muFIXED.v3.ntuple.v120006/user.GordonWatts.8069.J3_pythia_jetjet_muFIXE
D.v3.ntuple.v120006.AANT0._00001.root
rfio:/dpm/mrs.grid.cnrs.fr/home/atlas/cppm/gwatts/user.GordonWatts.8069.J3_pythi
a_jetjet_muFIXED.v3.ntuple.v120006/user.GordonWatts.8069.J3_pythia_jetjet_muFIXE
D.v3.ntuple.v120006.AANT0._00002.root

-where do you run the client
=> on a local computer (a very good one,SL4 32b, 4Gb of memory …)

-network speed and bandwidth between the RFIo server and your client
=> Gb/s

-version of ROOT
=> 5.14e

Karim

OK this is a nice setup. The only thing that I can think of is a non negligible network latency between your computer and the RFIO server.
Could you test it by running ping from your computer to the RFIO server.
If the latency is greater than 1ms, then I would suggest to try the version 5.17/09. This version (you must install it from source) contains a cache for RFIO that could improve substantially the performance in this case.
Let me know.

Rene

the ping is good

maratlas2>ping joe.mrs.grid.cnrs.fr
PING joe.mrs.grid.cnrs.fr (139.124.70.118) 56(84) bytes of data.
64 bytes from joe.mrs.grid.cnrs.fr (139.124.70.118): icmp_seq=0 ttl=61 time=0.215 ms
64 bytes from joe.mrs.grid.cnrs.fr (139.124.70.118): icmp_seq=1 ttl=61 time=0.207 ms
64 bytes from joe.mrs.grid.cnrs.fr (139.124.70.118): icmp_seq=2 ttl=61 time=0.208 ms
64 bytes from joe.mrs.grid.cnrs.fr (139.124.70.118): icmp_seq=3 ttl=61 time=0.206 ms
64 bytes from joe.mrs.grid.cnrs.fr (139.124.70.118): icmp_seq=4 ttl=61 time=0.203 ms

It remains two possible explanations:
-the disks on the server are slower than on your local machine (unprobable)
-Several jobs use the same disk concurrently

Anyway, to make progress I suggest you try with 5.17/09 first.

Rene

This question is my fault – and Karim hasn’t repsonded because the move to this new version of ROOT requires me to update my MakeProxy files. I’m afraid I won’t get to this until after break, probably. Have a great holiday, at any rate! -Gordon.

Hi,
I finally did this test. Running on 4 large files the average rate was 90 events/second. Running on 157 small files the average rate was 22 events/second. The overhead that is killing us seems to be the file open overhead.

At Karim’s suggestion, I copied the 4 large files to a local disk and ran on them to see what the speed difference was – 200 events/second when the files are local.