pyROOT possible bug in interaction of XRootD and RDataFrame

Dear experts,

I would like to report an issue observed which might or might not be related to some ROOT or XRootD bug in some special use case in python.

Here you can find a reproducer of the issue:

import ROOT as r
from XRootD import client
import fnmatch
import os 
import sys 


def xrootd_glob(xrootd_path):    
    if "///" in xrootd_path :
        raise RuntimeError("Invalid, you have /// too many slashes in the name")
    url = client.URL(xrootd_path)
    # Extract server and path components using XRootD's URL parser
    server = f"{url.protocol}://{url.hostid}"
    path_part = url.path
    # Extract URL parameters (authentication tokens, etc.)
    url_params = url.path_with_params.replace(
        url.path, '', 1) if url.path_with_params != url.path else ""

    try:
        dir_path, pattern = os.path.split(path_part)
    except ValueError:
        raise ValueError(
            "Invalid XRootD path format. Expected: root://server//path/to/dir/*.pattern"
        )
    fs = client.FileSystem(server)
    # Use the directory path with the URL parameters from signed_path
    dir_path_with_params = f"{dir_path}/{url_params}"
    status, listing = fs.dirlist(dir_path_with_params)
    if not status.ok:
        raise RuntimeError(
            f"Failed to list directory {dir_path} on {server}: {status.message}"
        )
    matching_files = [
        f"{server}/{dir_path}/{entry.name}{url_params}" for entry in listing
        if fnmatch.fnmatch(entry.name, pattern)
    ]
    print(f"globbing xrootd : {xrootd_path} returns")
    for _ in matching_files:
        print(f"{_}")
    return matching_files

if __name__ == "__main__":     
    path = "root://eoshome-r.cern.ch//eos/user/r/rquaglia/TupleProcess*.root"
    files = xrootd_glob(path)
    # with no data frame instance it runs
    df = r.RDataFrame("DecayTuple", files)
    h = df.Histo1D("B0_ETA")    
    print(f"entries = {h.GetEntries()}")
    # like this get stucked forever
    
    # it exit execution
    # import os
    # os._exit(0) 
    
    # do not exit execution
    import sys
    sys.exit(0)  

The idea of the script is to be able to perform a ‘glob’ of the files over xrootd and then use the returned values to instantiate a RDataFrame. I pointed the example to my own file , but you can adapt to any other file present on something reachable over xrootd.

The observed behaviour is the following :

  • making a rdataframe and doing operations and nothing else, leave the execution in a state that prevent the application to exit

  • with a os._exit(0) it works, with sys.exit(0) it doesn’t

For my code i can add os._exit(0) to any application i have, but it would be good to understand the root cause of the problem and if there are guidelines to best use xrootd python module and RDataFrame together. I know there is an open MR to support for xrootd a glob functionality , but it would be good to have a test case combined with RDataFrame.

Thanks in advance ,
Renato


Some details of the setup i have on lxplus:

   ------------------------------------------------------------------
  | Welcome to ROOT 6.36.04                        https://root.cern |
  | (c) 1995-2025, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Sep 07 2025, 21:26:37                 |
  | From tags/6-36-04@6-36-04                                        |
  | With                                                             |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

bash-5.2$ xrootd -v
v5.8.4

In [4]: from XRootD import client
   ...: 
   ...: # Create a dummy client or check version via module
   ...: print(client.__version__)
5.8.4

Thanks for reaching out! I think @vpadulan will take a look when he’ll find time for it (he’s very busy right now)

Dear @rquaglia ,

Thank you for reaching out! And thanks for the reproducer! I don’t have time right now to look into this but I’ll try it. Meanwhile, I wanted to ask a clarification. In your example, if you only write until the print statement, with nothing else afterwards, it gets stuck forever? If so, how can it ever move to the next lines with os._exit in the case where it’s not stuck?

Cheers,

Vincenzo

Hi @vpadulan , the, event loops and all operations runs and print the output, it is just that the python execution do not manage to exit except if i add the os._exit() at the end.
Hope it vlarify

Ok I see. Can I ask you then to try something else? Could you try putting these lines inside a separate function, then call that function inside the if name == “__main__” block? What I’m wondering is if being at the global scope there is some resource which is not well handled and it makes the application hang. Whereas by putting all the code in a separate function the garbage collector should/could be acting more aggressively.