Home | News | Documentation | Download

ROOT TFIle.Open in threaded Python3 program

Hi all,

I am playing around with ROOT I/O again and built some scripts that read remote ROOT files. The nature of the task is to do small random reads, which of course have very high latency on remote files, but I can live with that. To offset the high latency, I want to use many parallel threads to keep many requests in flight at the same time, and this works very well: I use multiprocessing.pool.ThreadPool (that is a real threadpool, not the normal multiprocessing process pool), and metree.GetEntry._threaded = True, and everything works pretty well.

Except there is one problem: To read a file from 100’s of threads, I open the file 100’s of times (I assume using the same TFile from multiple threads is a bad idea), but opening the file takes multiple seconds. So I tried to set ROOT.TFile.Open._threaded = True, and it does seem to work, though ROOT really seems to not like this being called from multiple threads at the same time. With a lock on the Python side around it I can at least release the GIL that way and overlap other operations, but opening 100’s of files sequentially is still a bottleneck.

Is there a better way to parallelize the waiting time when opening remote files? I could go with a process Pool, but that seems quite a bit more painful than a threadpool.

Any hints are appreciated.

Cheers,

Marcel


Please read tips for efficient and successful posting and posting code

ROOT Version: JupyROOT 6.18/04
Platform: CC7, CMSSW_11_1_PY3 environment
Compiler: g++ (GCC) 8.3.1 20190225


Hi,

Can you also add, at the beginning of your script (before creating any threads):

ROOT.ROOT.EnableThreadSafety()

After that, it should be ok to open and read multiple TFiles from multiple threads.

Hi @etejedor,

I tried adding ROOT.ROOT.EnableThreadSafety(), but without the TFile.Open._threaded = True open still seems to hold the GIL and with it I still get crashes. It might be that I am setting the option too late (this is in a jupyter notebook), not sure.

Anyways, I found that I can use the AsyncOpen interface directly and essentially just reimlement TFile.Open in Python:

def asyncopen(name, timeout):
    handle = ROOT.TFile.AsyncOpen(ROOTPREFIX + name)
    while timeout > 0 and ROOT.TFile.GetAsyncOpenStatus(handle) == 1: # kAOSInProgress
        time.sleep(1)
        timeout -= 1
    if timeout == 0:
        return None
    tfile = ROOT.TFile.Open(handle)
    if tfile.IsOpen():
        return tfile
    return None

This works well for me, might be useful for others.

Thanks,

Marcel

Ok thank you for sharing your solution!