With a 2.7GB file on the root server:
root -l https://root.cern.ch/files/lhcb2.root -e 'E->Draw("m_version")' -q 1.33s user 0.38s system 62% cpu 2.771 total
With a 2.7GB file on the root server:
root -l https://root.cern.ch/files/lhcb2.root -e 'E->Draw("m_version")' -q 1.33s user 0.38s system 62% cpu 2.771 total
Not sure what do you man by “your server”. The file is on cernbox, can’t you download it?
By the way I put it on a s3 bucket. It is faster, but still 11 seconds. So it seems mainly a problem of cernbox.
time root -b -l -q -e "TFile::Open(\"http://rgw.fisica.unimi.it/test-ruggero/test_ntuples_200123.root?AWSAccessKeyId=M06HBTUGIKXVXYH1RES6&Signature=hpX%2FNzIKINZd825AWEGw%2FuVQ4nU%3D&Expires=1693581796\"); Electrons_All->Draw(\"pt__NOSYS\")"
Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1
________________________________________________________
Executed in 11.22 secs fish external
usr time 426.22 millis 0.00 micros 426.22 millis
sys time 149.82 millis 875.00 micros 148.94 millis
When reading from cernbox strace tells me most of the time is used by futex
. This is not the case when reading from my disk or from rgw.fisica.unimi.it
So, there is something very wrong with the cooperation between ROOT and CERNBox.
@wiso I confirm that opening the test file from your “rgw” server takes 1.4 s and the drawing 13.6 s (which is still ten times longer than it should be, as it shouldn’t be longer than the opening time for such a small file).
Well, It seems that the problem sits in the ROOT C++ code … maybe also @linev could have some ideas.
I tried the “jsroot” and the plot comes quite fast (after 1 s I get the “jsroot” window and then after some 3 s I get the plot):
Of course, ROOT THttpServer
is not used by cernbox.
Cernbox provides directly a way to open ROOT file with JSROOT - just clicking on the file open new tab with JSROOT browser. But tree drawing is performed on client side - means all necessary data need to be load to the client. Therefore performance depends on connection speed between client and cernbox servers.
@linev 1 Gb/s ethernet connection is too slow for you? Note that both trials (native ROOT and “jsroot”) use exactly the same cernbox link and the difference is 3 minutes versus 3 seconds drawing time (for a file which is 4.5 MB long with 68k events in the tree).
3 min with normal ROOT C++ TTree::Draw? Really strange.
I tried uproot.
time python -c "import uproot; uproot.open('test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"
________________________________________________________
Executed in 344.48 millis fish external
usr time 368.35 millis 639.00 micros 367.71 millis
sys time 629.68 millis 88.00 micros 629.59 millis
time python -c "import uproot; uproot.open('http://rgw.fisica.unimi.it/test-ruggero/test_ntuples_200123.root?AWSAccessKeyId=M06HBTUGIKXVXYH1RES6&Signature=hpX%2FNzIKINZd825AWEGw%2FuVQ4nU%3D&Expires=1693581796').get('Electrons_All').arrays('pt__NOSYS')"
________________________________________________________
Executed in 763.77 millis fish external
usr time 444.30 millis 643.00 micros 443.65 millis
sys time 669.86 millis 96.00 micros 669.76 millis
time python -c "import uproot; uproot.open('https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"
it crashes
raise uproot.deserialization.DeserializationError(
uproot.deserialization.DeserializationError: while reading
TBasket version None as uproot.models.TBasket.Model_TBasket (? bytes)
fNbytes: 218759168
fObjlen: 65798144
fDatime: 293105760
fKeylen: 32314
fCycle: 85
Members for TBasket: fNbytes?, fObjlen?, fDatime?, fKeylen?, fCycle?
attempting to get bytes 38380:38398
outside expected range 6085:8333 for this Chunk
in file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root
Probably I have an idea.
cernbox
does not accept Ranges
in the requests and always return full file content.
Even when it declares Accept-Ranges
in the response headers.
JSROOT has workaround - it request complete file content once and then reusing it. Of course, this does not work for large files.
ROOT does not have such workaround and for each small request gets full content again and again.
Therefore it may take very long time to process such file.
Therefore I will not recommend to use cernbox for such applications - before problem will be fixed.
I submit issue to the cernbox feedback form - let wait for their response.
If you are right then it’s not just about the CERNBox but maybe about all similar ownCloud based servers?
A possible workaround is to use xrootd rather than http(s) for cernbox, which is usually possible.
I am not sure how @wiso 's URL in particular translates to an xrootd path, but in general for an URL such as https://cernbox.cern.ch/files/spaces/eos/project/r/root-eos/public/hsimple.root
the equivalent xrootd URL is root://eosproject.cern.ch//eos/project/r/root-eos/public/hsimple.root
(requires valid credentials, e.g. an active kerberos ticket).
Other locations on cernbox will use eospublic.cern.ch
, eosuser.cern.ch
or similar rather than eosproject
.
I hope this helps,
Enrico
For production EOS is a solution, but not for inspecting ROOT files on cernbox website with JSROOT
Whether some particular file from some specific site can be accessed via xrootd (e.g., from EOS) instead of https (e.g., from CERNBox) is irrelevant to this discussion (though you could be interested in this thread: “Frequent failure to update ROOTfiles at /eos/”).
What @linev reports is a serious issue that, to his knowledge, users face with many different https servers.
Apparently, https servers either explicitly say they DO NOT Accept-Ranges
or, worse, they say they DO but then send the whole file upon every request.
So, I think the relevant ROOT C++ code should be protected against such cases.
At first request, it should automatically detect that it got the whole file (regardless if the server claimed it accepted Ranges
) and then reuse the provided file (just like “jsroot” does now).
If the first request returned “partial content”, ROOT should use the Ranges
feature.
@wiso Maybe you could ping uproot developers about it.
I disagree that pointing out a possible workaround for the user’s problem (or for other users that end up here with a similar problem) is irrelevant to the discussion, but I completely agree that the underlying issue needs attention, I raised the point in ROOT’s I/O mattermost channel yesterday.
Record not found. Is that assumed to be?
Updated the link. Still, it will only be visible with CERN account :-/ It’s about potentially re-writing cernbox URLs to xrootd URLs.
As per the discussion at the SNOW ticket linked above, it turns out EOS does support range requests and it’s unclear (to me at least ) what is causing issues for ROOT I/O.
I opened Slow reads via HTTP from EOS · Issue #13018 · root-project/root · GitHub to not lose track of the issue.