Skip to content

Commit

Permalink
Better browsing (#199)
Browse files Browse the repository at this point in the history
* Don't choke on uninterpretable data

* Return branch element if end of braching reached

* Remove unnecessary test condition
  • Loading branch information
tamasgal authored Feb 8, 2023
1 parent 7bf7918 commit 1a239fd
Show file tree
Hide file tree
Showing 7 changed files with 120 additions and 57 deletions.
102 changes: 79 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,25 @@
UnROOT.jl is a reader for the [CERN ROOT](https://root.cern) file format
written entirely in Julia, without any dependence on ROOT or Python.

## Important API changes in v0.9.0

We decided to alter the behaviour of `getindex(f::ROOTfile, s::AbstractString)` which is essentially
the method called called when `f["foo/bar"]` is used. Before `v0.9.0`, `UnROOT` tried to do a best guess
and return a tree/branch or even fully parsed data. This lead to two bigger issues.

1. Errors prevented any further exploration once `UnROOT` bumped into something it could not interpret, although it might not even be requested by the user (e.g. the interpretation of a single branch in a tree, while others would work fine)
2. Unpredictable behaviour (type instability): the path dictates which type of data is returned.

Starting from `v0.9.0` we introduce an interface where `f["..."]` always returns genuine ROOT datatypes (or custom ones if you provide interpretations) and only perfroms the actual parsing when explicitly requested by the user via helper methods like `LazyBranch(f, "...")`.

Long story short, the following pattern can be used to fix your code when upgrading to `v0.9.0`:

f("foo/bar") => LazyBranch(f, "foo/bar")

The `f["foo/bar"]` accessor should now work on almost all files and is a handy utility to explore the ROOT data structures.

See [PR199](https://github.com/JuliaHEP/UnROOT.jl/pull/199) for more details.

## Installation Guide
1. Download the latest [Julia release](https://julialang.org/downloads/)
2. Open up Julia REPL (hit `]` once to enter Pkg mode, hit backspace to exit it)
Expand All @@ -27,24 +46,31 @@ julia> using UnROOT
julia> f = ROOTFile("test/samples/NanoAODv5_sample.root")
ROOTFile with 2 entries and 21 streamers.
test/samples/NanoAODv5_sample.root
└─ Events
├─ "run"
├─ "luminosityBlock"
├─ "event"
├─ "HTXS_Higgs_pt"
├─ "HTXS_Higgs_y"
└─ ""
├─ Events (TTree)
│ ├─ "run"
│ ├─ "luminosityBlock"
│ ├─ "event"
│ ├─ ""
│ ├─ "L1_UnpairedBunchBptxPlus"
│ ├─ "L1_ZeroBias"
│ └─ "L1_ZeroBias_copy"
└─ untagged (TObjString)


julia> mytree = LazyTree(f, "Events", ["Electron_dxy", "nMuon", r"Muon_(pt|eta)$"])
Row │ Electron_dxy nMuon Muon_eta Muon_pt
│ Vector{Float32} UInt32 Vector{Float32} Vector{Float32}
─────┼───────────────────────────────────────────────────────────
1 │ [0.000371] 0 [] []
2 │ [-0.00982] 2 [0.53, 0.229] [19.9, 15.3]
3 │ [] 0 [] []
4 │ [-0.00157] 0 [] []

Row │ Electron_dxy nMuon Muon_pt Muon_eta
│ SubArray{Float3 UInt32 SubArray{Float3 SubArray{Float3
─────┼────────────────────────────────────────────────────────────────────────────
1 │ [0.000371] 0 [] []
2 │ [-0.00982] 2 [19.9, 15.3] [0.53, 0.229]
3 │ [] 0 [] []
4 │ [-0.00157] 0 [] []
5 │ [] 0 [] []
6 │ [-0.00126] 0 [] []
7 │ [0.0612, 0.000642] 2 [22.2, 4.43] [-1.13, 1.98]
8 │ [0.00587, 0.000549, -0.00617] 0 [] []
992 rows omitted
```
### RNTuple
Expand All @@ -57,20 +83,30 @@ julia> using UnROOT
julia> f = ROOTFile("./test/samples/RNTuple/test_ntuple_stl_containers.root");

julia> f["ntuple"]
UnROOT.RNTuple:
header:
UnROOT.RNTuple with 5 rows, 13 fields, and metadata:
header:
name: "ntuple"
ntuple_description: ""
writer_identifier: "ROOT v6.29/01"
schema:
schema:
RNTupleSchema with 13 top fields
├─ :lorentz_vector Struct
├─ :vector_tuple_int32_string Vector
├─ :string String
├─ :vector_string Vector
...
..
.
├─ :vector_vector_int32 Vector
├─ :vector_variant_int64_string Vector
├─ :vector_vector_string Vector
├─ :variant_int32_string Union
├─ :array_float StdArray{3}
├─ :tuple_int32_string Struct
├─ :array_lv StdArray{3}
├─ :pair_int32_string Struct
└─ :vector_int32 Vector

footer:
cluster_summaries: UnROOT.ClusterSummary[ClusterSummary(num_first_entry=0, num_entries=5)]

julia> LazyTree(f, "ntuple")
Row │ string vector_int32 array_float vector_vector_i vector_string vector_vector_s variant_int32_s vector_variant_
│ String Vector{Int32} StaticArraysCor Vector{Vector{I Vector{String} Vector{Vector{S Union{Int32, St Vector{Union{In
Expand Down Expand Up @@ -109,11 +145,31 @@ XRootD is also supported, depending on the protocol:
- (1.6+ only) or the "url" has to start with `root://` and have another `//` to separate server and file path
```julia
julia> r = @time ROOTFile("https://scikit-hep.org/uproot3/examples/Zmumu.root")
0.034877 seconds (5.13 k allocations: 533.125 KiB)
3.284499 seconds (13.10 M allocations: 670.450 MiB, 4.62% gc time, 93.34% compilation time)
ROOTFile with 1 entry and 18 streamers.
https://scikit-hep.org/uproot3/examples/Zmumu.root
└─ events (TTree)
├─ "Type"
├─ "Run"
├─ "Event"
├─ ""
├─ "phi2"
├─ "Q2"
└─ "M"
julia> r = ROOTFile("root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root")
ROOTFile with 1 entry and 19 streamers.
root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root
└─ Events (TTree)
├─ "run"
├─ "luminosityBlock"
├─ "event"
├─ ""
├─ "Electron_dxyErr"
├─ "Electron_dz"
└─ "Electron_dzErr"
```
## TBranch of custom struct
Expand Down
9 changes: 9 additions & 0 deletions src/displays.jl
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ function children(t::TTree)
return ks
end
end
children(t::Union{TTree, TBranchElement}) = t.fBranches
function Base.show(io::IO, ::MIME"text/plain", b::Union{TTree, TBranchElement})
if isempty(b.fBranches)
print(io, b)
else
print_tree(io, b)
end
end
printnode(io::IO, t::TBranchElement) = print(io, "$(t.fName)")
printnode(io::IO, t::TTree) = print(io, "$(t.fName) (TTree)")
printnode(io::IO, f::ROOTFile) = print(io, f.filename)
printnode(io::IO, f::ROOTDirectory) = print(io, "$(f.name) (TDirectory)")
Expand Down
3 changes: 2 additions & 1 deletion src/iteration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ mutable struct LazyBranch{T,J,B} <: AbstractVector{T}
[0:-1 for _ in 1:Threads.nthreads()])
end
end
LazyBranch(f::ROOTFile, s::AbstractString) = LazyBranch(f, _getindex(f, s))
basketarray(lb::LazyBranch, ithbasket) = basketarray(lb.f, lb.b, ithbasket)
basketarray_iter(lb::LazyBranch) = basketarray_iter(lb.f, lb.b)

Expand Down Expand Up @@ -404,7 +405,7 @@ function LazyTree(f::ROOTFile, tree::TTree, s, branches)
replace!(tail, "fCoordinates" => "")
norm_name = join([head; join(tail)], "_")
end
d[Symbol(norm_name)] = f["$s/$b"]
d[Symbol(norm_name)] = LazyBranch(f, "$s/$b")
end
return LazyTree(NamedTuple{Tuple(keys(d))}(values(d)))
end
Expand Down
12 changes: 2 additions & 10 deletions src/root.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,15 +135,7 @@ end


function Base.getindex(f::ROOTFile, s::AbstractString)
S = _getindex(f, s)
if S isa Union{TBranch, TBranchElement}
# try # if we can't construct LazyBranch, just give up (maybe due to custom class)
return LazyBranch(f, S)
# catch
# @warn "Can't automatically create LazyBranch for branch $s. Returning a branch object"
# end
end
S
_getindex(f, s)
end

@memoize LRU(maxsize = 2000) function _getindex(f::ROOTFile, s)
Expand Down Expand Up @@ -402,7 +394,6 @@ function auto_T_JaggT(f::ROOTFile, branch; customstructs::Dict{String, Type})
else
leaftype = _normalize_ftype(leaf.fType)
_type = get(_leaftypeconstlookup, leaftype, nothing)
isnothing(_type) && error("Cannot interpret type.")
if branch.fType == Const.kSubbranchSTLCollection
_type = Vector{_type}
end
Expand Down Expand Up @@ -486,6 +477,7 @@ function readbasketseek(f::ROOTFile, branch::Union{TBranch, TBranchElement}, see
basketkey = unpack(rawbuffer, TBasketKey)
compressedbytes = compressed_datastream(rawbuffer, basketkey)

@debug "Seek position: $seek_pos"
basketrawbytes = decompress_datastreambytes(compressedbytes, basketkey)

@debug begin
Expand Down
4 changes: 4 additions & 0 deletions src/streamers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ struct TStreamerInfo{T}
end

function unpack(io, tkey::TKey, refs::Dict{Int32, Any}, T::Type{TStreamerInfo})
@debug "Unpacking: $(tkey)"
preamble = Preamble(io, T)
fName, fTitle = nametitle(io)
fCheckSum = readtype(io, UInt32)
Expand Down Expand Up @@ -305,8 +306,11 @@ struct TObjArray
elements
end
Base.getindex(obj::TObjArray, index) = obj.elements[index]
Base.length(a::TObjArray) = length(a.elements)
Base.iterate(a::TObjArray, state=1) = state > length(a) ? nothing : (a.elements[state], state+1)

function unpack(io, tkey::TKey, refs::Dict{Int32, Any}, T::Type{TObjArray})
@debug "Unpacking: $(tkey)"
preamble = Preamble(io, T)
skiptobj(io)
name = readtype(io, String)
Expand Down
3 changes: 2 additions & 1 deletion src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ function decompress_datastreambytes(compbytes, tkey)
compression_header = unpack(io, CompressionHeader)
cname, _, compbytes, uncompbytes = unpack(compression_header)
rawbytes = read(io, compbytes)
@debug cname
@debug "Compression type: $(cname)"
@debug "Compressed/uncompressed size in bytes: $(compbytes) / $(uncompbytes)"

if cname == "L4"
# skip checksum which is 8 bytes
Expand Down
44 changes: 22 additions & 22 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -181,13 +181,13 @@ end
close(rootfile)

rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_large_array_lz4.root"))
arr = collect(rootfile["t1/float_array"])
arr = collect(LazyBranch(rootfile, rootfile["t1/float_array"]))
@test 100000 == length(arr)
@test [0.0, 1.0588236, 2.1176472, 3.1764705, 4.2352943] arr[1:5] atol=1e-7
close(rootfile)

rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_int_array_zstd.root"))
arr = collect(rootfile["t1/a"])
arr = collect(LazyBranch(rootfile, "t1/a"))
@test arr == 0:99
close(rootfile)
end
Expand Down Expand Up @@ -249,7 +249,7 @@ end
@testset "TLorentzVector" begin
# 64bits T
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "TLorentzVector.root"))
branch = rootfile["t1/LV"]
branch = LazyBranch(rootfile, "t1/LV")
tree = LazyTree(rootfile, "t1")

@test branch[1].x == 1.0
Expand All @@ -262,7 +262,7 @@ end

# jagged LVs
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "Jagged_TLorentzVector.root"))
branch = rootfile["t1/LVs"]
branch = LazyBranch(rootfile, "t1/LVs")
tree = LazyTree(rootfile, "t1")

@test eltype(branch) <: AbstractVector{LorentzVectors.LorentzVector{Float64}}
Expand All @@ -273,7 +273,7 @@ end

@testset "TNtuple" begin
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "TNtuple.root"))
arrs = [collect(rootfile["n1/$c"]) for c in "xyz"]
arrs = [collect(LazyBranch(rootfile, "n1/$c")) for c in "xyz"]
@test length.(arrs) == fill(100, 3)
@test arrs[1] 0:99
@test arrs[2] arrs[1] .+ arrs[1] ./ 13
Expand All @@ -284,7 +284,7 @@ end
@testset "Singly jagged branches" begin
# 32bits T
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_jagged_array.root"))
data = rootfile["t1/int32_array"]
data = LazyBranch(rootfile, "t1/int32_array")
@test data[1] == Int32[]
@test data[1:2] == [Int32[], Int32[0]]
@test data[end] == Int32[90, 91, 92, 93, 94, 95, 96, 97, 98]
Expand All @@ -293,7 +293,7 @@ end
# 64bits T
T = Float64
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "tree_with_jagged_array_double.root"))
data = rootfile["t1/double_array"]
data = LazyBranch(rootfile, "t1/double_array")
@test data isa AbstractVector
@test eltype(data) <: AbstractVector{T}
@test data[1] == T[]
Expand Down Expand Up @@ -326,11 +326,11 @@ end
vvi = [[[2], [3, 5]], [[7, 9, 11], [13]], [[17], [19], []], [], [[]]]
vvf = [[[2.5], [3.5, 5.5]], [[7.5, 9.5, 11.5], [13.5]], [[17.5], [19.5], []], [], [[]]]
@test UnROOT.array(rootfile, "t1/bi") == vvi
@test rootfile["t1/bi"] == vvi
@test eltype(eltype(eltype(rootfile["t1/bi"]))) === Int32
@test LazyBranch(rootfile, "t1/bi") == vvi
@test eltype(eltype(eltype(LazyBranch(rootfile, "t1/bi")))) === Int32
@test UnROOT.array(rootfile, "t1/bf") == vvf
@test rootfile["t1/bf"] == vvf
@test eltype(eltype(eltype(rootfile["t1/bf"]))) === Float32
@test LazyBranch(rootfile, "t1/bf") == vvf
@test eltype(eltype(eltype(LazyBranch(rootfile, "t1/bf")))) === Float32
close(rootfile)
end

Expand Down Expand Up @@ -365,7 +365,7 @@ end
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "NanoAODv5_sample.root"))
event = UnROOT.array(rootfile, "Events/event")
@test event[1:3] == UInt64[12423832, 12423821, 12423834]
Electron_dxy = rootfile["Events/Electron_dxy"]
Electron_dxy = LazyBranch(rootfile, "Events/Electron_dxy")
@test eltype(Electron_dxy) == SubArray{Float32, 1, Vector{Float32}, Tuple{UnitRange{Int64}}, true}
@test Electron_dxy[1:3] [Float32[0.0003705], Float32[-0.00981903], Float32[]]
HLT_Mu3_PFJet40 = UnROOT.array(rootfile, "Events/HLT_Mu3_PFJet40")
Expand All @@ -377,7 +377,7 @@ end
tree = LazyTree(rootfile, "Events", r"Muon_(pt|eta)$")
@test sort(propertynames(tree) |> collect) == sort([:Muon_pt, :Muon_eta])
@test occursin("LazyEvent", repr(first(iterate(tree))))
@test sum(rootfile["Events/HLT_Mu3_PFJet40"]) == 443
@test sum(LazyBranch(rootfile, "Events/HLT_Mu3_PFJet40")) == 443
close(rootfile)
end

Expand Down Expand Up @@ -467,9 +467,9 @@ end
"KM3NETDAQ::JDAQEvent.KM3NETDAQ::JDAQEventHeader" => UnROOT._KM3NETDAQEventHeader
)
f_auto = UnROOT.ROOTFile(joinpath(SAMPLES_DIR, "km3net_online.root"), customstructs=customstructs)
headers_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader"]
event_hits_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/snapshotHits"]
event_thits_auto = f_auto["KM3NET_EVENT/KM3NET_EVENT/triggeredHits"]
headers_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader")
event_hits_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/snapshotHits")
event_thits_auto = LazyBranch(f_auto, "KM3NET_EVENT/KM3NET_EVENT/triggeredHits")

for event_hits [event_hits_manual, event_hits_auto]
@test length(event_hits) == 3
Expand Down Expand Up @@ -592,7 +592,7 @@ end
rootfile = UnROOT.samplefile("cms_ntuple_wjet.root")
pts1 = UnROOT.array(rootfile, "variable/met_p4/fCoordinates/fCoordinates.fPt"; raw=false)
pts2 = LazyTree(rootfile, "variable", [r"met_p4/fCoordinates/.*", "mll"])[!, Symbol("met_p4_fPt")]
pts3 = rootfile["variable/good_jets_p4/good_jets_p4.fCoordinates.fPt"]
pts3 = LazyBranch(rootfile, "variable/good_jets_p4/good_jets_p4.fCoordinates.fPt")
@test 24 == length(pts1)
@test Float32[69.96958, 25.149912, 131.66693, 150.56802] == pts1[1:4]
@test pts1 == pts2
Expand All @@ -601,7 +601,7 @@ end

# issue 61
rootfile = UnROOT.samplefile("issue61.root")
@test rootfile["Events/Jet_pt"][:] == Vector{Float32}[[], [27.324587, 24.889547, 20.853024], [], [20.33066], [], []]
@test LazyBranch(rootfile, "Events/Jet_pt")[:] == Vector{Float32}[[], [27.324587, 24.889547, 20.853024], [], [20.33066], [], []]
close(rootfile)

# issue 78
Expand All @@ -615,8 +615,8 @@ end
# unsigned short -> Int16, ulong64 -> UInt64
# file minified with `rooteventselector --recreate -l 2 "trackntuple.root:trackingNtuple/tree" issue108_small.root`
rootfile = ROOTFile(joinpath(SAMPLES_DIR, "issue108_small.root"))
@test rootfile["tree/trk_algoMask"][2] == [0x0000000000004000, 0x0000000000004000, 0x0000000000004000, 0x0000000000004000]
@test rootfile["tree/pix_ladder"][3][1:5] == UInt16[0x0001, 0x0001, 0x0001, 0x0001, 0x0003]
@test LazyBranch(rootfile, "tree/trk_algoMask")[2] == [0x0000000000004000, 0x0000000000004000, 0x0000000000004000, 0x0000000000004000]
@test LazyBranch(rootfile, "tree/pix_ladder")[3][1:5] == UInt16[0x0001, 0x0001, 0x0001, 0x0001, 0x0003]
close(rootfile)

# issue 116
Expand Down Expand Up @@ -747,10 +747,10 @@ end
@test sort(keys(f["mydir"])) == ["Events", "c", "d", "mysubdir"]
@test sort(keys(f["mydir/mysubdir"])) == ["e", "f"]
@test sum(length.(LazyTree(f, "mydir/Events").Jet_pt)) == 4
@test sum(length.(f["mydir/Events/Jet_pt"])) == 4
@test sum(length.(LazyBranch(f, "mydir/Events/Jet_pt"))) == 4

f = UnROOT.samplefile("issue11_tdirectory.root")
@test sum(f["Data/mytree/Particle0_E"]) 1012.0
@test sum(LazyBranch(f, "Data/mytree/Particle0_E")) 1012.0
end

@testset "Basic C++ types" begin
Expand Down

0 comments on commit 1a239fd

Please sign in to comment.