Error when streaming Rootfiles via XRootD

sbrommer · January 30, 2020, 1:48pm

Hi dear ROOT experts,

i am currently struggling with streaming rootfile via xrootd.
I am using the LCG 95 stack (/cvmfs/sft.cern.ch/lcg/views/LCG_95/x86_64-centos7-gcc8-opt/bin/root)

ROOT Version: ROOT Version: 6.16/00
Built for linuxx8664gcc on Jan 23 2019, 09:06:13
Platform: CentOS7

A minimal example of what i am trying to do:

import ROOT

inputfiles = [
    'root://cmsxrootd-kit.gridka.de:1094//store/user/sbrommer/smhtt/2016/ntuples/DYJetsToLLM50_RunIISummer16MiniAODv3_PUMoriond17_13TeV_MINIAOD_madgraph-pythia8_ext2-v2/DYJetsToLLM50_RunIISummer16MiniAODv3_PUMoriond17_13TeV_MINIAOD_madgraph-pythia8_ext2-v2.root'
]
friend_inputfiles_collection = [[
    'root://cmsxrootd-kit.gridka.de:1094//store/user/sbrommer/smhtt/2016/friends/NNScore/emb_ff_stage1_fix/DYJetsToLLM50_RunIISummer16MiniAODv3_PUMoriond17_13TeV_MINIAOD_madgraph-pythia8_ext2-v2/DYJetsToLLM50_RunIISummer16MiniAODv3_PUMoriond17_13TeV_MINIAOD_madgraph-pythia8_ext2-v2.root'
]]
folder = "tt_nominal/ntuple"
name = '#tt#tt_emb#ZL#smhtt#Run2016#tt_max_score#125#'
prename = 'tt_max_score'
cut ='(tt_max_index==0)'
tree = ROOT.TChain()
for inputfile in inputfiles:
    tree.Add(inputfile + "/" + folder)
friend_trees = []
if friend_inputfiles_collection != None:
    for friend_inputfiles in friend_inputfiles_collection:
        friend_tree = ROOT.TChain()
        for friend_inputfile in friend_inputfiles:
            friend_tree.Add(friend_inputfile + "/" + folder)
        tree.AddFriend(friend_tree)
        friend_trees.append(friend_tree)
    counthist = ROOT.TH1D(name, name, 17, 0.0, 1.0)
    tree.Draw("{}>>{}".format(prename, name), cut, "goff")

output_file = ROOT.TFile('test.root', "recreate")
counthist.Write()
output_file.Close()

The code above works, depending on the size of the cut in the Draw() command. For a short cutstring this example works perfeclty, but if i use a larger cutstring with many different variables

cut = "(tt_max_index==0)*(flagMETFilter==1)*(extraelec_veto<0.5)*(extramuon_veto<0.5)*(dilepton_veto<0.5)*(byVLooseDeepTau2017v2p1VSmu_1>0.5 && byVLooseDeepTau2017v2p1VSmu_2>0.5)*(byVVLooseDeepTau2017v2p1VSe_1>0.5 && byVVLooseDeepTau2017v2p1VSe_2>0.5)*(byTightDeepTau2017v2p1VSjet_1>0.5)*(byTightDeepTau2017v2p1VSjet_2>0.5)*(q_1*q_2<0)*(trg_doubletau==1)*(!(gen_match_1==5 && gen_match_2==5) && !(gen_match_1 == 6 || gen_match_2 == 6))*(generatorWeight)*(isoWeight_1*isoWeight_2)*(idWeight_1*idWeight_2)*(puweight)*(trackWeight_1*trackWeight_2)*(eleTauFakeRateWeight*muTauFakeRateWeight)*((((abs(eta_2)<2.1)*((byTightDeepTau2017v2p1VSjet_1<0.5 && byVLooseDeepTau2017v2p1VSjet_1>0.5)*crossTriggerDataEfficiencyWeight_vloose_DeepTau_1 + (byTightDeepTau2017v2p1VSjet_1>0.5)*crossTriggerDataEfficiencyWeight_tight_DeepTau_1))*((abs(eta_2)<2.1)*((byTightDeepTau2017v2p1VSjet_2<0.5 && byVLooseDeepTau2017v2p1VSjet_2>0.5)*crossTriggerDataEfficiencyWeight_vloose_DeepTau_2 + (byTightDeepTau2017v2p1VSjet_2>0.5)*crossTriggerDataEfficiencyWeight_tight_DeepTau_2)))/(((abs(eta_2)<2.1)*((byTightDeepTau2017v2p1VSjet_1<0.5 && byVLooseDeepTau2017v2p1VSjet_1>0.5)*crossTriggerMCEfficiencyWeight_vloose_DeepTau_1 + (byTightDeepTau2017v2p1VSjet_1>0.5)*crossTriggerMCEfficiencyWeight_tight_DeepTau_1))*((abs(eta_2)<2.1)*((byTightDeepTau2017v2p1VSjet_2<0.5 && byVLooseDeepTau2017v2p1VSjet_2>0.5)*crossTriggerMCEfficiencyWeight_vloose_DeepTau_2 + (byTightDeepTau2017v2p1VSjet_2>0.5)*crossTriggerMCEfficiencyWeight_tight_DeepTau_2))))*(1.0)*(1.0)*(prefiringweight)*((((gen_match_1 == 5)*(((decayMode_1!=11)*tauIDScaleFactorWeight_tight_DeepTau2017v2p1VSjet_1)+((decayMode_1==11)*0.89484048)) + (gen_match_1 != 5))*((gen_match_2 == 5)*(((decayMode_2!=11)*tauIDScaleFactorWeight_tight_DeepTau2017v2p1VSjet_2)+((decayMode_2==11)*0.89484048)) + (gen_match_2 != 5))))*(zPtReweightWeight)*(((genbosonmass >= 50.0) * 4.1545e-05*((npartons == 0 || npartons >= 5)*1.0+(npartons == 1)*0.32123574062076404+(npartons == 2)*0.3314444833963529+(npartons == 3)*0.3389929050626262+(npartons == 4)*0.2785338687268455) + (genbosonmass < 50.0)*(numberGeneratedEventsWeight * crossSectionPerEventWeight)))*(35870.0)*((gen_match_2==1 || gen_match_2==3)*(((abs(eta_1) < 1.46) * (1./0.6) * 1.22) + ((abs(eta_1) > 1.5588) * (1./0.88) * 1.47))+!(gen_match_2==1 || gen_match_2==3))"

i get xrootd error messages (see also full error log attached)

Got a kXR_error response to request kXR_readv (handle: 0x00000000, chunks: 125398, total size: 27554298) [3002] Single readv transfer is too large.

following by root segfaults and a coredump. When i process the file locally, i don’t have any issues. Do you have any idea how to fix this error ?

Best Wishes

Sebastian

EDIT: when i contruct an equally long cutstring which uses just one variable

cut = "*".join(['(tt_max_index==0)'for i in xrange(200)])

the example also works, so the issue seems to be related to the number of different variables that are accessed via the cutstring.

xrootd_error.txt (132.5 KB)

swunsch · January 30, 2020, 2:29pm

Hi!

This could be related to the issue ROOT-6639.

Probably @pcanal or @Axel remember the details?

Best
Stefan

jblomer · February 3, 2020, 8:03am

This issue seems be fixed for a long time. @ganis Perhaps you have an idea?

sbrommer · February 5, 2020, 10:37am

I performed some additional checks and it seems like the issue is not exclusively related to the GridKa dcache, i tried using a different site, and i get the same error.
Also i tried using different input files and the error seems to depend on the file sizes of the input files. In my checks, the streaming fails if the input file is larger than 32 GB.

Maybe this help.

rcaspart · February 5, 2020, 11:53am

Hi,

To narrow this down I setup a local XRootD server, serving a copy of Sebastians files and tried running his script with this server. This works fine and does not report an error. Consequently I suspect this might well be an issue related to dCache (and maybe the implementation of XRootD on the dCache side).

Also I checked some logs of the dCache instance for useful information and found the log entries like this for the time of the accesses (note, that the first number differs depending on the cut-string or file):
[] Vector read of 2097152016 bytes requested, exceeds maximum frame size of 2097152 bytes!

DCache claims to have received vector reads which are several order of magnitudes larger than allowed. Checking the log Sebastian provided I do not to see any mentioning of a request which is even remotely close to this size.
Is there any other way to check which size the vector read requests send from root have?

HTH,
Rene

Axel · February 7, 2020, 7:55am

Our expert is currently on leave - we’ll have to wait for his return before we can make progress here… Please ping us / me end of next week should you not hear from us before!

simonm · February 7, 2020, 3:03pm

Just an update from XRootD perspective: in vanilla XRootD the length of a single element in readv request cannot exceed 2097136 (2MB - 16) and the maximum number of elements in a readv request cannot exceed 1024. You can find what the limits are for particular implementation (dCache) using following query: xrdfs dcache-instance query config readv_ior_max readv_iov_max

Those limits are not configurable, they were picked to correspond to the use case the readv was designed for, which is lots of ‘small’ transfers. If I’m not mistaken, for everything else than ‘small’ transfers root will fallback to standard I/O?

If desired, I can implement client side mechanism that will handle elements whose length is greater than the limit with a standard read request (if so please create respective issue in https://github.com/xrootd/xrootd).

(Sorry for my limited responsiveness but I’m on sick leave and next week due to the winter break at schools I’ll be on holidays.)

simonm · February 17, 2020, 2:45pm

From what I see in the logs posted by @sbrommer indeed the size of the vector write appears to be rather big:

[2020-01-30 14:39:08.842120 +0100][Dump ][AsyncSock ] [[2a00:139c:5:215:0:41:78:b8]:22994 #0.0] Successfully sent message: kXR_readv (handle: 0x00000000, chunks: 125398, total size: 27554298) (0x8c92520).

Unfortunately, we are not logging the length per single element in readv request (however it seems the maximum number of elements in a readv request is exceeded).

@rcaspart : is there a chance you still have the files that have been used in this test, could you make them available to me?

rcaspart · February 19, 2020, 11:22am

Sure, I put the files in my cern-box:
base file: https://cernbox.cern.ch/index.php/s/VDcgyKTsaa8h5FE
friend file: https://cernbox.cern.ch/index.php/s/uf28Esj1CTxn72O

Regarding the limits, from dCache I get as replies:
[f01-120-184-e.gridka .de:21988] / > query config readv_ior_max
2097136
[f01-120-184-e.gridka .de:21988] / > query config readv_iov_max
178956968

The readv_iov_max here is not as large as in the issue mentioned by Stefan, but it still seems to be rather large and I guess might be more than the server can handle (although I am not sure if this is the underlying problem here).

rcaspart · February 19, 2020, 2:12pm

I spend a bit of time this afternoon diging around for this issue. It turns out the work-around deployed for the sake of having root play nice with dCache after the previous issue is no longer working since dCache version 2.11. At that point a patch was implemented to mitigate this issue, which set the readv_iov_max to 178956968 instead of the previous value of 2147483647 [1].

Since I suspected this might still be too large, I locally compiled a root version where I changed the respective line to the new number and hence forced a lower readv_iov_max to be used. With this change I can run Sebastian’s script without problems.

Any advice on how to proceed?

[1] https://github.com/dCache/dcache/commit/92b36682cf3dd4aab14dab64ded6bf40e5660f4d

pcanal · February 20, 2020, 9:55pm

If you have a patch for this, maybe you could open a PR on github?

simonm · February 24, 2020, 4:30pm

@rcaspart : Just to clarify the issue is dCache specific, right? (e.g. I cannot reproduce it if I host the file on EOS/XRootD). Could you tell us which line/value in root have you changed and what is the new value you have used?

rcaspart · February 24, 2020, 5:01pm

yes, to my understanding this is dCache specific and my best guess is you will not be able to reproduce this with EOS/XRootD unless you specifically alter the settings of the server to most likely unreasonable values.
What I did was to change the value put in as a workaround for dCache in TNetXNGFile following the issue ROOT-6639.
I changed the value which the reply from the server is compared to from “0x7FFFFFFF” to “0xAAAAAA8”. This reflects changes in the dCache code from version 2.11 onward where the readv_iov_max setting was changed, presumably to mitigate this issue on dCache side.

system · March 9, 2020, 5:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.