Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterFinder tool cleanup #192

Open
marc1uk opened this issue May 30, 2021 · 0 comments
Open

ClusterFinder tool cleanup #192

marc1uk opened this issue May 30, 2021 · 0 comments

Comments

@marc1uk
Copy link
Collaborator

marc1uk commented May 30, 2021

lines 60--120 seem to be dangling code copied over from an old tool that have no functionality here. the diffuser position is not set, and the resulting 'expected times' vector is never used.

lines 251--262 using std::sort would be shorter and clearer.

A key element of this tool seems to be finding clusters of hits by explicitly checking for hits at floating point times in 2ns chunks?. It's generally not a good idea to be comparing floating point (or double) values using ==, and I'm not sure if times are expected to fall in such 2ns bins? (if so, why are we using a double?) Moreover, the preceding comment suggests this is counting the number of hits in a window, so rather than checking an oddly restricted subset of all possible times for a hit, it should probably be looping over hits and checking if(hit_time >= window_start && hit_time < window_end).

Furthermore the cluster finding algorithm in general seems questionable. What's happening here is, starting from each hit time, it extends a window forward and counts the number of hits in that window. So if we had times at 10, 12, 13, 14, 15, 16 the window size is 3, and the minimum cluster size is 2, it'll form clusters from [10,12], [12,13,14], [13,14,15], and [14,15,16]. It then scans through these clusters, takes the largest, and after noting it, removes all hits within that cluster from any other clusters. Since we have multiple of the same size, it'll take the first largest - [12,13,14]. It then removes all those elements from any other clusters - i.e. we end up with [10], [15], and [15,16]. It then looks for the next biggest cluster, which is [15,16]. But the "cluster time" for this cluster is 14, because that was the time it originally opened its window. That no longer matches the hits still in that cluster, and the times of the remaining hits in the cluster are subsequently ignored when building the cluster charge and hits.
While this example has been limited to just a very small number of hits, I think it illustrates that this algorithm can build clusters with incorrect times, charges and contained hits.

lines 457--461 results in a pointless loop if verbose<=2, move the if statement to wrap the loop

lines 463--495 are doing nothing

As per #190 this tool is a prime example of MC / Data code factoring. Lines 199--215 are exactly the same as the following lines 217--233, with the sole replacement of a std::vector<Hit> with std::vector<MCHit>. The same is true of 356--372 vs 373--389 and 407--427 vs 428-448.

MCHit is derived from Hit, and no MC specific members are being accessed here. We could work around the types by putting the code in templated functions to have the compiler instantiate the two versions without impacting readability, but that seems a bit awkward. Perhaps a better way is to have data be stored as a vector of pointers to Hit objects, which can then be dynamically cast to MCHitobjects as necessary. We can use smart pointers to ensure memory is appropriately released; perhaps something like:

std::vector<std::shared_ptr<Hit>> Hits;
Hits.emplace_back(std::make_shared<Hit>(arg1,arg2));
// or alternatively
Hits.emplace_back(std::make_shared<MCHit>(arg1,arg2));
@marc1uk marc1uk changed the title timeclustering tool cleanup ClusterFinder tool cleanup May 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant