Reading from http very slow

Reading directly a ROOT file from https is very slow, comparing to downloading it and using it locally.
As test I am doing a very simple plot

Electrons_All->Draw("pt__NOSYS")

If I open the file from https directly, e.g. with

root -l https://cernbox.cern.ch/remote.php/dav/public-files/ntHJKDVRTjlSkyS/test_ntuples_200123.root

making the plots takes ~5 minutes (the ntuple is very small, 60k events).
On the contrary, if I download it it takes ~0s to make the plot. To download it with wget it take 0.6s (4.5MB)

wget https://cernbox.cern.ch/remote.php/dav/public-files/ntHJKDVRTjlSkyS/test_ntuples_200123.root
root -l test_ntuples_200123.root 

ROOT Version: 6.26/10
Platform: Linux fedora 6.0.18-300.fc37.x86_64
Compiler: gcc 12.2.1


Reading a ROOT file directly from a HTTPS connection can be slow because the data must be transferred over the network before it can be processed. This can add significant overhead and latency, particularly for large files. When you download the file locally, the data is already on your machine and can be processed much faster.

@couet This “test” file is just 4.5 MB long.

May be @pcanal has some ideas about it.

Has anybody ever looked at it?

@wiso Can you provide the “test file” again (in the same place) so that one can test the new ROOT version(s)?

I haven’t. May be @pcanal has an idea.

Here a valid link

root -l https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root 

It gives me:

% time root -l -q https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root 

Attaching file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root as _file0...
(TFile *) 0x7fd8f39ef000
root -l -q   0.81s user 0.24s system 10% cpu 9.783 total
%

Is it to long ?

ROOT 6.28/04 on a Ubuntu 18.04 LTS / x86_64 machine (with 1 Gb/s ethernet, the wget reported 10 MB/s downloading speed for the “test_ntuples_200123.root” file) …

[...]$ time root -b -l -q https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root

Attaching file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root as _file0...
(TFile *) 0x55c224c3b950

real    0m8.699s <- 9 seconds to open a 4.5 MB file !!!
user    0m0.425s
sys     0m0.122s
[...]$ time root -b -l -q https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root  -e 'Electrons_All->Draw("pt__NOSYS");'

Attaching file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root as _file0...
(TFile *) 0x55d006f19b10
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1

real    3m34.464s <- WTF ???
user    0m0.899s
sys     0m0.226s

could it be a cernbox issue ? openning a file the ROOT server is fast:

% time root -l https://root.cern.ch/files/usa.root -q

Attaching file https://root.cern.ch/files/usa.root as _file0...
(TFile *) 0x7f7e1109fc00
root -l https://root.cern.ch/files/usa.root -q  0.80s user 0.23s system 87% cpu 1.183 total

It seems you are only opening the file. The issue is when reading the data to plot the histogram.

still fast if I draw someting in that file:

% time root -l https://root.cern.ch/files/usa.root -q -e 'texas->Draw("AL")'

Attaching file https://root.cern.ch/files/usa.root as _file0...
(TFile *) 0x7fa8050c4c00
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
root -l https://root.cern.ch/files/usa.root -q -e 'texas->Draw("AL")'  1.19s user 0.35s system 72% cpu 2.131 total

Right, but @Wile_E_Coyote reported 3 minutes with my file. Do you think it is a cernbox issue? I don’t have any issue when downloading that file with wget from cernbox.

@couet For test purposes, can you copy the “test_ntuples_200123.root” to your “root” server? We could then test it.

I am not sure it is a cernbox issue . I just am just guessing. Yes we should try with a bigger file. I am not sure I have the right to copy that file on the root server.

With a 2.7GB file on the root server:

root -l https://root.cern.ch/files/lhcb2.root -e 'E->Draw("m_version")' -q  1.33s user 0.38s system 62% cpu 2.771 total

Not sure what do you man by “your server”. The file is on cernbox, can’t you download it?
By the way I put it on a s3 bucket. It is faster, but still 11 seconds. So it seems mainly a problem of cernbox.

time root -b -l -q -e "TFile::Open(\"http://rgw.fisica.unimi.it/test-ruggero/test_ntuples_200123.root?AWSAccessKeyId=M06HBTUGIKXVXYH1RES6&Signature=hpX%2FNzIKINZd825AWEGw%2FuVQ4nU%3D&Expires=1693581796\"); Electrons_All->Draw(\"pt__NOSYS\")"

Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1

________________________________________________________
Executed in   11.22 secs      fish           external
   usr time  426.22 millis    0.00 micros  426.22 millis
   sys time  149.82 millis  875.00 micros  148.94 millis

When reading from cernbox strace tells me most of the time is used by futex. This is not the case when reading from my disk or from rgw.fisica.unimi.it

So, there is something very wrong with the cooperation between ROOT and CERNBox.

@wiso I confirm that opening the test file from your “rgw” server takes 1.4 s and the drawing 13.6 s (which is still ten times longer than it should be, as it shouldn’t be longer than the opening time for such a small file).

Well, It seems that the problem sits in the ROOT C++ code … maybe also @linev could have some ideas.

I tried the “jsroot” and the plot comes quite fast (after 1 s I get the “jsroot” window and then after some 3 s I get the plot):

https://jsroot.gsi.de/dev/?file=https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root&item=Electrons_All;9/pt__NOSYS&opt=