Of course, ROOT THttpServer is not used by cernbox.
Cernbox provides directly a way to open ROOT file with JSROOT - just clicking on the file open new tab with JSROOT browser. But tree drawing is performed on client side - means all necessary data need to be load to the client. Therefore performance depends on connection speed between client and cernbox servers.
@linev 1 Gb/s ethernet connection is too slow for you? Note that both trials (native ROOT and “jsroot”) use exactly the same cernbox link and the difference is 3 minutes versus 3 seconds drawing time (for a file which is 4.5 MB long with 68k events in the tree).
time python -c "import uproot; uproot.open('test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"
________________________________________________________
Executed in 344.48 millis fish external
usr time 368.35 millis 639.00 micros 367.71 millis
sys time 629.68 millis 88.00 micros 629.59 millis
from rgw.fisica.unimi.it
time python -c "import uproot; uproot.open('http://rgw.fisica.unimi.it/test-ruggero/test_ntuples_200123.root?AWSAccessKeyId=M06HBTUGIKXVXYH1RES6&Signature=hpX%2FNzIKINZd825AWEGw%2FuVQ4nU%3D&Expires=1693581796').get('Electrons_All').arrays('pt__NOSYS')"
________________________________________________________
Executed in 763.77 millis fish external
usr time 444.30 millis 643.00 micros 443.65 millis
sys time 669.86 millis 96.00 micros 669.76 millis
from cernbox
time python -c "import uproot; uproot.open('https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"
it crashes
raise uproot.deserialization.DeserializationError(
uproot.deserialization.DeserializationError: while reading
TBasket version None as uproot.models.TBasket.Model_TBasket (? bytes)
fNbytes: 218759168
fObjlen: 65798144
fDatime: 293105760
fKeylen: 32314
fCycle: 85
Members for TBasket: fNbytes?, fObjlen?, fDatime?, fKeylen?, fCycle?
attempting to get bytes 38380:38398
outside expected range 6085:8333 for this Chunk
in file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root
cernbox does not accept Ranges in the requests and always return full file content.
Even when it declares Accept-Ranges in the response headers.
JSROOT has workaround - it request complete file content once and then reusing it. Of course, this does not work for large files.
ROOT does not have such workaround and for each small request gets full content again and again.
Therefore it may take very long time to process such file.
Therefore I will not recommend to use cernbox for such applications - before problem will be fixed.
A possible workaround is to use xrootd rather than http(s) for cernbox, which is usually possible.
I am not sure how @wiso 's URL in particular translates to an xrootd path, but in general for an URL such as https://cernbox.cern.ch/files/spaces/eos/project/r/root-eos/public/hsimple.root the equivalent xrootd URL is root://eosproject.cern.ch//eos/project/r/root-eos/public/hsimple.root (requires valid credentials, e.g. an active kerberos ticket).
Other locations on cernbox will use eospublic.cern.ch, eosuser.cern.ch or similar rather than eosproject.
Whether some particular file from some specific site can be accessed via xrootd (e.g., from EOS) instead of https (e.g., from CERNBox) is irrelevant to this discussion.
What @linev reports is a serious issue that, to his knowledge, users face with many different https servers.
Apparently, https servers either explicitly say they DO NOT Accept-Ranges or, worse, they say they DO but then send the whole file upon every request.
So, I think the relevant ROOT C++ code should be protected against such cases.
At first request, it should automatically detect that it got the whole file (regardless if the server claimed it accepted Ranges) and then reuse the provided file (just like “jsroot” does now).
If the first request returned “partial content”, ROOT should use the Ranges feature.
@wiso Maybe you could ping uproot developers about it.
I disagree that pointing out a possible workaround for the user’s problem (or for other users that end up here with a similar problem) is irrelevant to the discussion, but I completely agree that the underlying issue needs attention, I raised the point in ROOT’s I/O mattermost channel yesterday.