Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make.index.unique() can create unsorted index #241

Closed
joshuaulrich opened this issue May 8, 2018 · 4 comments
Closed

make.index.unique() can create unsorted index #241

joshuaulrich opened this issue May 8, 2018 · 4 comments
Assignees
Labels

Comments

@joshuaulrich
Copy link
Owner

joshuaulrich commented May 8, 2018

make.index.unique() should add a small eps to duplicate index values. This often works, but can fail when there are consecutive observations with the same timestamp, and first observation after the block of duplicate timestamps is less than the cumulative eps.

options(digits.secs = 6)
(x <- .xts(1:5, c(rep(0, 4), 2) / 1e6))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000000    2
# 1969-12-31 18:00:00.000000    3
# 1969-12-31 18:00:00.000000    4
# 1969-12-31 18:00:00.000002    5
(y <- make.index.unique(x))
#                            [,1]
# 1969-12-31 18:00:00.000000    1
# 1969-12-31 18:00:00.000001    2
# 1969-12-31 18:00:00.000002    3
# 1969-12-31 18:00:00.000003    4
# 1969-12-31 18:00:00.000002    5

It would be nice if this could be fixed, but floating point rounding error is likely to be a problem in the cases this may occur. A warning should be raised, at minimum.

Session Info

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.10-2.1 zoo_1.8-1   

loaded via a namespace (and not attached):
[1] compiler_3.4.3  tools_3.4.3     grid_3.4.3      lattice_0.20-35
@joshuaulrich joshuaulrich self-assigned this May 8, 2018
joshuaulrich added a commit that referenced this issue May 9, 2018
The functional change in this commit is adding eps when this is

    newindex_real[i] <= newindex_real[i-1]

instead of when this is true:

    index_real[i-1] == index_real[i]

This ensures the new index is always in increasing order, even
observation following a contiguous block of duplicate timestamps
less than the sum of the epsilons.

The memcpy ensures 'newindex' is equal to the current 'index'
the loop, which also means we longer need 'index_real'.

Fixes #241.
@ckatsulis
Copy link

Main consideration here is if xts would be the class of choice for ultra low latency analysis in finance or other disciplines. If that is the case, as you have said earlier, support for more precise time indexing would be the solution. As I don't know the details of that solution, will leave it to you to weight implementation costs and potential issues with backwards compatibility.

@joshuaulrich
Copy link
Owner Author

I would like nanosecond resolution in xts, but that will take a bit of work. This problem could theoretically exist even with higher resolution index timestamps, but it would be less probable.

The current solution in this branch works around the issue by always checking and ensuring that the value of newindex[i] is always greater than both index[i-1] and newindex[i-1]. The downside to this solution is that non-duplicate index values may change. I plan to add a warning whenever that happens before merging this branch and closing the issue. I can't think of a better general solution, but I'm open to suggestions.

@braverock
Copy link
Contributor

This sounds correct to me.

It has always been possible that make.index.unique could push things past the next observed index, and that problem only got worse as reported data moved beyond millisecond precision.

Until xts supports nano-scale indexes, checking and warning the user that they may be doing something unintended with make.index.unique seems the best solution.

@ghost
Copy link

ghost commented May 15, 2018

@joshuaulrich:

I would like nanosecond resolution in xts, but that will take a bit of work.

I showed a possible solution to this problem (#190 (comment)) and yes, it is a lot of work, but in my opinion it is worth taking up this effort. The key is to not break xts R API for external packages.

If you decide on the proposed solution, then of course I will be very committed to help as much as possible (especially in the case of C code).

joshuaulrich added a commit that referenced this issue May 29, 2018
This can happen if the cumulative epsilon for a set of duplicate index
values is larger than the first unique index value that follows. We
will overwrite that non-duplicate index value with the prior index
value + eps when this happens, and warn the user.

See #241.
@joshuaulrich joshuaulrich added this to the Release 0.11-0 milestone Jul 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants