sbinet
June 23, 2022, 7:59am
#1
hi there,
I am trying to implement the RNTuple
reader from the specs:
I’ve been able to properly parse the header/footer envelopes:
header:
{vers:1 minv:1 flags:[0] release:1 name:Staff descr: library:ROOT v6.26/04 fields:[{vers:0 typv:0 pfid:0 role:0 flag:0 nrep:0 fname:Category tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:1 role:0 flag:0 nrep:0 fname:Flag tname:std::uint32_t alias: descr:} {vers:0 typv:0 pfid:2 role:0 flag:0 nrep:0 fname:Age tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:3 role:0 flag:0 nrep:0 fname:Service tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:4 role:0 flag:0 nrep:0 fname:Children tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:5 role:0 flag:0 nrep:0 fname:Grade tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:6 role:0 flag:0 nrep:0 fname:Step tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:7 role:0 flag:0 nrep:0 fname:Hrweek tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:8 role:0 flag:0 nrep:0 fname:Cost tname:std::int32_t alias: descr:} {vers:0 typv:0 pfid:9 role:0 flag:0 nrep:0 fname:Division tname:std::string alias: descr:} {vers:0 typv:0 pfid:10 role:0 flag:0 nrep:0 fname:Nation tname:std::string alias: descr:}] cols:[{kind:11 bits:32 fieldID:0 flags:0} {kind:11 bits:32 fieldID:1 flags:0} {kind:11 bits:32 fieldID:2 flags:0} {kind:11 bits:32 fieldID:3 flags:0} {kind:11 bits:32 fieldID:4 flags:0} {kind:11 bits:32 fieldID:5 flags:0} {kind:11 bits:32 fieldID:6 flags:0} {kind:11 bits:32 fieldID:7 flags:0} {kind:11 bits:32 fieldID:8 flags:0} {kind:2 bits:32 fieldID:9 flags:5} {kind:5 bits:8 fieldID:9 flags:0} {kind:2 bits:32 fieldID:10 flags:5} {kind:5 bits:8 fieldID:10 flags:0}] aliases:[] extra:[] crc32:403897527},
footer:
{vers:1 minv:1 flags:[0] hdr:403897527 xhdrs:[] colGroups:[] clInfos:[{firstEntry:0 nentries:3354 colGrpID:-1}] clGroups:[{n:1 pages:{size:492 locator:{pos:72208 storage:207 url:}}}] mdBlocks:[] crc32:3437551349}
(on the ntpl001_staff.root
file from the tutos)
but then, the specs are a bit more blurry as for how the data is organized in the data pages and how that data is extracted by the columns (also, the indexing and split-encoding is just mentioned “in passing”).
could these be clarified? (@jblomer I guess)
thanks,
-s
Hi Sebastien,
Some details of the specification are not yet merged, such as split encoding and 64bit index columns. So the data files are slightly behind specification (which is why the RNTuples in ROOT files are still marked “release candidate”).
Regarding the specs itself, can you point out where exactly they become unclear? Perhaps we can stick to the ntpl001_staff.root
file as an example.
Cheers,
Jakob
(Since the ntpl001_staff.root
example has no collections, we can even park the details on index columns for the time being.)
sbinet
June 24, 2022, 2:20pm
#4
thanks for the reply.
I had a bug in the decoding of the compressed payload. (that led me astray)
and I worried this was because of missing bits about the split-encoding.
the meaning of the “32bit compression settings” (in the page list inner frame), is a bit opaque.
I assumed it’s the same than the “usual” ROOT compression algorithms:
func rootCompressAlgLvl(v uint32) (Kind, int) {
var (
alg = Kind(v / 100)
lvl = int(v % 100)
)
return alg, lvl
}
anyways, I got it working for the ntpl_001_staff.root
file:
cluster[0,0,0]: Category
00000000 ca 00 00 00 12 02 00 00 3c 01 00 00 69 01 00 00 |........<...i...|
00000010 2e 01 00 00 2f 01 00 00 2e 01 00 00 69 01 00 00 |..../.......i...|
00000020 54 01 00 00 69 01 00 00 69 01 00 00 2f 01 00 00 |T...i...i.../...|
00000030 2e 01 00 00 2c 01 00 00 69 01 00 00 69 01 00 00 |....,...i...i...|
00000040 3c 01 00 00 2f 01 00 00 69 01 00 00 69 01 00 00 |<.../...i...i...|
00000050 a3 01 00 00 ca 00 00 00 30 01 00 00 cc 00 00 00 |........0.......|
00000060 cc 00 00 00 30 01 00 00 30 01 00 00 ca 00 00 00 |....0...0.......|
00000070 cc 00 00 00 ca 00 00 00 ca 00 00 00 2e 01 00 00 |................|
cluster[0,1,0]: Flag
00000000 0f 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000010 0f 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000020 0f 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000030 0f 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000040 0b 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000050 0d 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000060 0f 00 00 00 0f 00 00 00 0f 00 00 00 0f 00 00 00 |................|
00000070 0f 00 00 00 0b 00 00 00 0f 00 00 00 0d 00 00 00 |................|
[...]
cluster[0,12,0]: Nation
00000000 44 45 43 48 46 52 46 52 44 45 49 54 43 48 49 54 |DECHFRFRDEITCHIT|
00000010 44 45 46 52 46 52 43 48 43 48 43 48 44 45 46 52 |DEFRFRCHCHCHDEFR|
00000020 43 48 46 52 46 52 46 52 46 52 44 45 4e 4c 44 45 |CHFRFRFRFRDENLDE|
00000030 47 42 46 52 46 52 46 52 46 52 49 54 49 54 44 45 |GBFRFRFRFRITITDE|
00000040 4e 4c 43 48 46 52 49 54 47 42 47 42 43 48 43 48 |NLCHFRITGBGBCHCH|
00000050 44 45 49 54 43 48 46 52 43 48 46 52 49 54 46 52 |DEITCHFRCHFRITFR|
00000060 49 54 41 54 43 48 4e 4c 43 48 42 45 43 48 46 52 |ITATCHNLCHBECHFR|
00000070 43 48 46 52 47 42 41 54 4e 4f 46 52 41 54 43 48 |CHFRGBATNOFRATCH|
what are the PRs (if any) that add the split-encoding documentation stanzas?
(feel free to mention me (@sbinet
on github) on such documentation PRs)
thanks again.
PS: the specs as a whole are really nice to read. I wish I had something like that for TTree
Cool that you managed to parse the format!
I added a clarification on the compression settings in a PR .
The split encoding (and more encodings) are in a separate branch . The code is branched off on an older version of RNTuple and needs to be a bit cleaned up for the PRs. That includes documentation. There was a longer discussion on Mattermost on the details. The code in the ntuple-split
branch allowed us to look into the improvements we can get from “encoding before compression”, which are summarized in a Google Sheet .
Cheers,
Jakob
system
Closed
July 11, 2022, 8:09pm
#6
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.