Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to.period is not working as expected for the first record #153

Open
cloudcell opened this issue Aug 24, 2016 · 10 comments
Open

to.period is not working as expected for the first record #153

cloudcell opened this issue Aug 24, 2016 · 10 comments

Comments

@cloudcell
Copy link
Contributor

The first endpoint is being assigned to the first record, regardless of time. In the following example the record with time 04:01 ends up not aggregated in the processed xts records.

Reproduction of the error:

  1. get this data
  2. use this code
xx <- to.period(GBPUSD,period = 'minutes', k=2)
head(GBPUSD)

Open High Low Close Volume
2002-10-21 04:01:00 1.5501 1.5501 1.5501 1.5501 22.2181704
2002-10-21 04:02:00 1.5501 1.5501 1.5501 1.5501 93.3404328
2002-10-21 04:03:00 1.5501 1.5501 1.5501 1.5501 25.7178698
2002-10-21 04:04:00 1.5501 1.5501 1.5501 1.5501 8.0730374
2002-10-21 04:05:00 1.5500 1.5500 1.5500 1.5500 1.9565426
2002-10-21 04:06:00 1.5493 1.5497 1.5493 1.5497 39.7676101

head(xx)

GBPUSD.Open GBPUSD.High GBPUSD.Low GBPUSD.Close GBPUSD.Volume
2002-10-21 04:01:00 1.5501 1.5501 1.5501 1.5501 22.218170
2002-10-21 04:03:00 1.5501 1.5501 1.5501 1.5501 119.058303
2002-10-21 04:05:00 1.5501 1.5501 1.5500 1.5500 10.029580
2002-10-21 04:07:00 1.5493 1.5498 1.5493 1.5498 122.681942
2002-10-21 04:09:00 1.5498 1.5498 1.5492 1.5492 62.382992
2002-10-21 04:11:00 1.5492 1.5492 1.5491 1.5491 63.479716

@joshuaulrich
Copy link
Owner

joshuaulrich commented Aug 24, 2016

I'm not convinced this is a bug. If you look at the output of endpoints, you will see why to.period is returning the first row unchanged.

head(endpoints(GBPUSD, 'minutes', k=2))
#[1] 0 1 3 5 7 9

This is because the index for GBPUSD contains values at the beginning of the minute, not the end. So the end point for the first two minutes of 2002-10-21T04:00:00 is 2002-10-21T04:01:59.999.

If you subtract a small amount from each index value, you get the behavior you seem to expect. This is because the endpoints output changed to reflect the index value changes.

.index(GBPUSD) <- .index(GBPUSD) - 0.0001
options(digits.secs=6, width=120)
head(xx <- to.period(GBPUSD,period = 'minutes', k=2))
#                         GBPUSD.Open GBPUSD.High GBPUSD.Low GBPUSD.Close GBPUSD.Volume
#2002-10-20 19:01:59.9998      1.5501      1.5501     1.5501       1.5501     115.55860
#2002-10-20 19:03:59.9998      1.5501      1.5501     1.5501       1.5501      33.79091
#2002-10-20 19:05:59.9998      1.5500      1.5500     1.5493       1.5497      41.72415
#2002-10-20 19:07:59.9998      1.5498      1.5498     1.5497       1.5498     144.68929
#2002-10-20 19:09:59.9998      1.5494      1.5494     1.5492       1.5492      41.73473
#2002-10-20 19:11:59.9998      1.5491      1.5493     1.5491       1.5493     113.86047
head(endpoints(GBPUSD, 'minutes', k=2))
#[1]  0  2  4  6  8 10

@cloudcell
Copy link
Contributor Author

...the index for GBPUSD contains values at the beginning of the minute, not the end. So the end point for the first two minutes of 2002-10-21T04:00:00 is 2002-10-21T04:01:59.999.

Note that there is no data point for T= 2002-10-21T04:00:00. Therefore, the first two minutes must be in the range of 2002-10-21T04:01:00 and 2002-10-21T04:02:59.999.

@joshuaulrich
Copy link
Owner

endpoints does not produce output based on first observed time in the data passed to it. The first element of endpoints' output is always zero, and in your example the second element will always be the location of the observation with an index value at or before 04:01:59.999, whether your data start at 04:00:00 or 04:01:59.998.

@cloudcell
Copy link
Contributor Author

cloudcell commented Aug 25, 2016

My question is how does endpoints find the first element? From what I read above, it seems that endpoints chooses 04:00:00 as a baseline (rounding to the beginning of the first hour value), regardless of what the time (in minutes) of our first record is. So if first record had a timestamp, say, 06:53:48, the baseline for endpoints would be 06:00:00.000. Then it would increment 2 minutes from there and the records would be 'sifted' through the "grid" that endpoints generates regardless of what data has to be 'sifted' through. Is that correct?

@joshuaulrich
Copy link
Owner

joshuaulrich commented Aug 25, 2016

endpoints finds the first element the same way it finds all the elements. It doesn't actively "choose" any time from the input data as a baseline.

The baseline is determined by the period you choose. If you choose on = "hours", then endpoints will use XX:59:59.999 as the cutoff. For on = "seconds", the cutoff is XX:XX:XX.999. For on = "months", the cutoff is the first day of each month.

@cloudcell
Copy link
Contributor Author

So the cutoff on on = "minutes" is the same as on = "hours", that is XX:59:59.999, is that correct ?

@joshuaulrich
Copy link
Owner

No, the cutoff for on = "minutes" would be XX:XX:59.999. I.e. the end of every minute (assuming k = 1).

@cloudcell
Copy link
Contributor Author

Ok, we're getting somewhere). So if k = 2 as in our case, then the cutoff will be what?

@joshuaulrich
Copy link
Owner

Every 2 minutes. So, xx:x1:59.999, xx:x3:59.999, etc.

The first 2 minutes of every hour are from xx:00:00.000-xx:01:59.999. The next two minutes are xx:02:00.000-xx:03:59.999... and the last 2 minutes are xx:58:00.000-xx:59:59.999.

@cloudcell
Copy link
Contributor Author

cloudcell commented Aug 25, 2016

So, in a way, you can state that for periods larger than 1 minute, the cutoff is between the end of one hour and the beginning of the next hour, or XX:59:59.999, just like I wrote above. Well, it's not a bug then, but endpoints' help would be clearer if there was a mentioning of these cutoff rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants