Reading from http very slow

Of course, ROOT THttpServer is not used by cernbox.

Cernbox provides directly a way to open ROOT file with JSROOT - just clicking on the file open new tab with JSROOT browser. But tree drawing is performed on client side - means all necessary data need to be load to the client. Therefore performance depends on connection speed between client and cernbox servers.

@linev 1 Gb/s ethernet connection is too slow for you? Note that both trials (native ROOT and “jsroot”) use exactly the same cernbox link and the difference is 3 minutes versus 3 seconds drawing time (for a file which is 4.5 MB long with 68k events in the tree).

3 min with normal ROOT C++ TTree::Draw? Really strange.

I tried uproot.

  1. From disk
time python -c "import uproot; uproot.open('test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"

________________________________________________________
Executed in  344.48 millis    fish           external
   usr time  368.35 millis  639.00 micros  367.71 millis
   sys time  629.68 millis   88.00 micros  629.59 millis
  1. from rgw.fisica.unimi.it
time python -c "import uproot; uproot.open('http://rgw.fisica.unimi.it/test-ruggero/test_ntuples_200123.root?AWSAccessKeyId=M06HBTUGIKXVXYH1RES6&Signature=hpX%2FNzIKINZd825AWEGw%2FuVQ4nU%3D&Expires=1693581796').get('Electrons_All').arrays('pt__NOSYS')"

________________________________________________________
Executed in  763.77 millis    fish           external
   usr time  444.30 millis  643.00 micros  443.65 millis
   sys time  669.86 millis   96.00 micros  669.76 millis
  1. from cernbox
time python -c "import uproot; uproot.open('https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root').get('Electrons_All').arrays('pt__NOSYS')"

it crashes

    raise uproot.deserialization.DeserializationError(
uproot.deserialization.DeserializationError: while reading

    TBasket version None as uproot.models.TBasket.Model_TBasket (? bytes)
        fNbytes: 218759168
        fObjlen: 65798144
        fDatime: 293105760
        fKeylen: 32314
        fCycle: 85
Members for TBasket: fNbytes?, fObjlen?, fDatime?, fKeylen?, fCycle?

attempting to get bytes 38380:38398
outside expected range 6085:8333 for this Chunk
in file https://cernbox.cern.ch/remote.php/dav/public-files/1Cy1HIf03Ca76Dm/test_ntuples_200123.root

Probably I have an idea.

cernbox does not accept Ranges in the requests and always return full file content.

Even when it declares Accept-Ranges in the response headers.

JSROOT has workaround - it request complete file content once and then reusing it. Of course, this does not work for large files.

ROOT does not have such workaround and for each small request gets full content again and again.
Therefore it may take very long time to process such file.

Therefore I will not recommend to use cernbox for such applications - before problem will be fixed.

I submit issue to the cernbox feedback form - let wait for their response.

If you are right then it’s not just about the CERNBox but maybe about all similar ownCloud based servers?

A possible workaround is to use xrootd rather than http(s) for cernbox, which is usually possible.

I am not sure how @wiso 's URL in particular translates to an xrootd path, but in general for an URL such as https://cernbox.cern.ch/files/spaces/eos/project/r/root-eos/public/hsimple.root the equivalent xrootd URL is root://eosproject.cern.ch//eos/project/r/root-eos/public/hsimple.root (requires valid credentials, e.g. an active kerberos ticket).

Other locations on cernbox will use eospublic.cern.ch, eosuser.cern.ch or similar rather than eosproject.

I hope this helps,
Enrico

For production EOS is a solution, but not for inspecting ROOT files on cernbox website with JSROOT

Whether some particular file from some specific site can be accessed via xrootd (e.g., from EOS) instead of https (e.g., from CERNBox) is irrelevant to this discussion (though you could be interested in this thread: “Frequent failure to update ROOTfiles at /eos/”).

What @linev reports is a serious issue that, to his knowledge, users face with many different https servers.

Apparently, https servers either explicitly say they DO NOT Accept-Ranges or, worse, they say they DO but then send the whole file upon every request.

So, I think the relevant ROOT C++ code should be protected against such cases.
At first request, it should automatically detect that it got the whole file (regardless if the server claimed it accepted Ranges) and then reuse the provided file (just like “jsroot” does now).
If the first request returned “partial content”, ROOT should use the Ranges feature.

@wiso Maybe you could ping uproot developers about it.

I disagree that pointing out a possible workaround for the user’s problem (or for other users that end up here with a similar problem) is irrelevant to the discussion, but I completely agree that the underlying issue needs attention, I raised the point in ROOT’s I/O mattermost channel yesterday.

FYI I’ve opened Login - CERN Service Portal: easy access to services at CERN

Record not found. Is that assumed to be?

Updated the link. Still, it will only be visible with CERN account :-/ It’s about potentially re-writing cernbox URLs to xrootd URLs.

As per the discussion at the SNOW ticket linked above, it turns out EOS does support range requests and it’s unclear (to me at least :sweat_smile: ) what is causing issues for ROOT I/O.

I opened Slow reads via HTTP from EOS · Issue #13018 · root-project/root · GitHub to not lose track of the issue.