Dear all,
We processed input information and saved output root files. We can also see that the total number of entries is identical in the input and output root files, that should be the case if our code is processed successfully.
For some of the output files, when we try to read them for several events inside the root file shows the following messages:
R__unzipLZMA: error 9 in lzma_code
Error in <TBasket::ReadBasketBuffers>: fNbytes = 2362, fKeylen = 89, fObjlen = 17140, noutot = 0, nout=0, nin=2273, nbuf=17140
I can read the entries even for these events. Also, I can skim and save to root file using pyROOT
.[1].
Problem: We are trying to read it using uproot, there we get an issue while saving to the .parquet
file [2], it shows an error given below [3].
Question:
- Is there any workaround using which we can avoid this error while working with uproot?
- What is the cause of this issue in our output ROOT files? I mean why it is there in the file that I created?
[1]
inFile = "root://eos.cms.rcac.purdue.edu:1094//store/user/rasharma/customNanoAOD_Others/UL2018/TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8/04005D2F-8BFF-8B43-BA27-4D78C5CBE179_NanoAOD.root"
file = ROOT.TFile.Open(inFile)
tree = file.Get("Events")
output_file = TFile.Open("out.root", "RECREATE")
skimmed_tree = tree.CloneTree()
skimmed_tree = tree.CopyTree("nMuon >= 2 && Muon_pt[0] > 20 && Muon_pt[1] > 20")
output_file.WriteObject(skimmed_tree, "physics")
output_file.Close()
[2]
import ROOT
import pandas as pd
inFile = "root://eos.cms.rcac.purdue.edu:1094//store/user/rasharma/customNanoAOD_Others/UL2018/TTTo2L2Nu_TuneCP5_13TeV-powheg-pythia8/04005D2F-8BFF-8B43-BA27-4D78C5CBE179_NanoAOD.root"
file = ROOT.TFile.Open(inFile)
tree = file.Get("Events")
print(tree.GetEntries())
nMuons = []
muon_pt = []
muon_eta = []
muon_phi = []
nentres = tree.GetEntries()
nentres = 20505
for i in range(nentres):
if i < 20500: continue
print(f"=====> Processing entry {i} <=====")
tree.GetEntry(i)
nMuons.append(tree.nMuon)
muon_pt.append(tree.Muon_pt)
muon_eta.append(tree.Muon_eta)
df = pd.DataFrame({
"nMuons": nMuons,
"Muon_pt": muon_pt,
"Muon_eta": muon_eta,
})
print(df.head())
df.to_parquet("test.parquet")
[3]
nMuons Muon_pt Muon_eta
0 2 [39.771934509277344, 14.352097511291504] [-0.8841629028320312, -0.5072555541992188]
1 1 [39.771934509277344] [-0.8841629028320312]
2 2 [39.771934509277344, 14.352097511291504] [-0.8841629028320312, -0.5072555541992188]
3 1 [39.771934509277344] [-0.8841629028320312]
4 1 [39.771934509277344] [-0.8841629028320312]
Traceback (most recent call last):
File "/depot/cms/private/users/shar1172/copperheadV2/scripts/test_LZMAError_UseUproot.py", line 47, in <module>
df.to_parquet("test.parquet")
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pandas/core/frame.py", line 3113, in to_parquet
return to_parquet(
^^^^^^^^^^^
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pandas/io/parquet.py", line 480, in to_parquet
impl.write(
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pandas/io/parquet.py", line 190, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 4623, in pyarrow.lib.Table.from_pandas
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 616, in dataframe_to_arrays
arrays = [convert_column(c, f)
^^^^^^^^^^^^^^^^^^^^
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 603, in convert_column
raise e
File "/depot/cms/kernels/coffea_latest/lib/python3.12/site-packages/pyarrow/pandas_compat.py", line 597, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 358, in pyarrow.lib.array
File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Could not convert <cppyy.LowLevelView object at 0x7f357c110cb0> with type cppyy.LowLevelView: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column Muon_pt with type object')
ROOT Version: 6.26/11
Platform: LXPLUS (CERN)
Container: el8_amd64_gcc11
CMSSW: CMSSW_13_0_14