make.index.unique() can create unsorted index #241

joshuaulrich · 2018-05-08T03:24:55Z

make.index.unique() should add a small eps to duplicate index values. This often works, but can fail when there are consecutive observations with the same timestamp, and first observation after the block of duplicate timestamps is less than the cumulative eps.

options(digits.secs = 6)
(x <- .xts(1:5, c(rep(0, 4), 2) / 1e6))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000000    2
# 1969-12-31 18:00:00.000000    3
# 1969-12-31 18:00:00.000000    4
# 1969-12-31 18:00:00.000002    5
(y <- make.index.unique(x))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000001    2
# 1969-12-31 18:00:00.000002    3
# 1969-12-31 18:00:00.000003    4
# 1969-12-31 18:00:00.000002    5

It would be nice if this could be fixed, but floating point rounding error is likely to be a problem in the cases this may occur. A warning should be raised, at minimum.

Session Info

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.10-2.1 zoo_1.8-1   

loaded via a namespace (and not attached):
[1] compiler_3.4.3  tools_3.4.3     grid_3.4.3      lattice_0.20-35

The text was updated successfully, but these errors were encountered:

The functional change in this commit is adding eps when this is newindex_real[i] <= newindex_real[i-1] instead of when this is true: index_real[i-1] == index_real[i] This ensures the new index is always in increasing order, even observation following a contiguous block of duplicate timestamps less than the sum of the epsilons. The memcpy ensures 'newindex' is equal to the current 'index' the loop, which also means we longer need 'index_real'. Fixes #241.

ckatsulis · 2018-05-10T15:42:27Z

Main consideration here is if xts would be the class of choice for ultra low latency analysis in finance or other disciplines. If that is the case, as you have said earlier, support for more precise time indexing would be the solution. As I don't know the details of that solution, will leave it to you to weight implementation costs and potential issues with backwards compatibility.

joshuaulrich · 2018-05-10T16:57:46Z

I would like nanosecond resolution in xts, but that will take a bit of work. This problem could theoretically exist even with higher resolution index timestamps, but it would be less probable.

The current solution in this branch works around the issue by always checking and ensuring that the value of newindex[i] is always greater than both index[i-1] and newindex[i-1]. The downside to this solution is that non-duplicate index values may change. I plan to add a warning whenever that happens before merging this branch and closing the issue. I can't think of a better general solution, but I'm open to suggestions.

braverock · 2018-05-14T15:40:42Z

This sounds correct to me.

It has always been possible that make.index.unique could push things past the next observed index, and that problem only got worse as reported data moved beyond millisecond precision.

Until xts supports nano-scale indexes, checking and warning the user that they may be doing something unintended with make.index.unique seems the best solution.

ghost · 2018-05-15T14:11:24Z

@joshuaulrich:

I would like nanosecond resolution in xts, but that will take a bit of work.

I showed a possible solution to this problem (#190 (comment)) and yes, it is a lot of work, but in my opinion it is worth taking up this effort. The key is to not break xts R API for external packages.

If you decide on the proposed solution, then of course I will be very committed to help as much as possible (especially in the case of C code).

This can happen if the cumulative epsilon for a set of duplicate index values is larger than the first unique index value that follows. We will overwrite that non-duplicate index value with the prior index value + eps when this happens, and warn the user. See #241.

joshuaulrich added the bug label May 8, 2018

joshuaulrich self-assigned this May 8, 2018

joshuaulrich closed this as completed in d487936 May 29, 2018

joshuaulrich added this to the Release 0.11-0 milestone Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make.index.unique() can create unsorted index #241

make.index.unique() can create unsorted index #241

joshuaulrich commented May 8, 2018 •

edited

Loading

ckatsulis commented May 10, 2018

joshuaulrich commented May 10, 2018

braverock commented May 14, 2018

ghost commented May 15, 2018

make.index.unique() can create unsorted index #241

make.index.unique() can create unsorted index #241

Comments

joshuaulrich commented May 8, 2018 • edited Loading

Session Info

ckatsulis commented May 10, 2018

joshuaulrich commented May 10, 2018

braverock commented May 14, 2018

ghost commented May 15, 2018

joshuaulrich commented May 8, 2018 •

edited

Loading