Benefits of using xrootd

Nicola_Mori · December 27, 2016, 7:34pm

Hi, I’m wondering wether using xrootd would benefit the performance of my data analysis workflow. My data resides on a disk server; the processing is done using a compute cluster, and each processing node accesses data by mounting the data disk via NFS. Typically, each node processes one data file. When multiple jobs are running the disk server is hammered by multiple file requests and also the network can be congested, so there is a maximum number of jobs that can be run in parallel before network I/O becomes the bottleneck.
To mitigate the problem I am thinking about running xrootd on the disk server and use it for data access rather than NFS. I never played with xrootd before so before starting I would like to know if it can, at least theoretically, give some I/O performance improvements when used as described above.
Thanks.

ganis · January 10, 2017, 4:31pm

Dear Nicola,

XRootD has been designed to handle optimally situations with concurrent clients for analysis jobs and several measurements have shown that it does it quite efficiently. Direct comparisons with NFS v3 where done in the beginning and did show a better scalability behaviour of XRootD, though figures may have changed with pNFS (NFS v4).
So it may be worth giving a try, in particular if you have indication that network I/O is not yet the bottleneck. XRootD in single server mode should not be difficult to setup.

Note, however, that other parameters can affect the performance of concurrent reading of files, in particular proper use of the TTreeCache. You may want to consider that too - or in addition - if not already done.

G Ganis