Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.ta.cdl_z() gives impossible high and low values #703

Open
JanHomann opened this issue Jul 27, 2023 · 1 comment
Open

df.ta.cdl_z() gives impossible high and low values #703

JanHomann opened this issue Jul 27, 2023 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@JanHomann
Copy link

JanHomann commented Jul 27, 2023

pandas-ta       0.3.14b0
TA-Lib          0.4.27
yfinance        0.2.26

Problem Description

The current implementation of cdl_z() often creates low values that are higher than the open, close or high and high values that are lower than the open, close or low.

Example

df = pd.DataFrame()
df.ta.ticker("spy", interval='1h', start="2023-01-01",  end="2023-07-27")
df[:"2023-07-27 15:30"].ta.cdl_z()
	                       open_Z_30_1    high_Z_30_1     low_Z_30_1     close_Z_30_1
Datetime				
2022-07-27 09:30:00-04:00	       NaN	      NaN	     NaN	      NaN
2022-07-27 10:30:00-04:00	       NaN	      NaN	     NaN	      NaN
2022-07-27 11:30:00-04:00	       NaN	      NaN	     NaN	      NaN
2022-07-27 12:30:00-04:00	       NaN	      NaN	     NaN	      NaN
2022-07-27 13:30:00-04:00	       NaN	      NaN	     NaN	      NaN
...	...	...	...	...
2023-07-26 11:30:00-04:00	  0.293027	 0.557565	0.683786	  0.51043
2023-07-26 12:30:00-04:00	   0.51497       0.114745       0.389549         0.209932
2023-07-26 13:30:00-04:00	  0.176863       0.404245	0.201786	  0.23817
2023-07-26 14:30:00-04:00 	  0.217598	  2.10736      -0.173332         0.351198
2023-07-26 15:30:00-04:00	  0.334565	 0.725941	0.360631   	   1.1173

Expected Behavior

As you can see in the example above, the last row has an open that is lower than the low (0.334565 < 0.360631) and a close that is higher than the high (1.1173 > 0.725941). This happens a lot, see two rows higher. Three rows higher the low is even higher than the high. This can be easily seen in any dataset, no need to exactly use the data in this example.

Additional Context

This seems to be related to the way the computation is performed. In the computation, the time series for open, high, low and close are independently z-scored, but they need to be z-scored together OR the open high low close need to be reassigned.

One way to achieve this would be to first compute the rolling z-score of the closing prices and then construct rescaled candles that have at least the property that the relative position of the open stays like it was before the z-scoring. (meaning that if we gapped up, we should still gap up in the z-scored version and not gap down suddenly just because of the z-scoring).

@JanHomann JanHomann added the bug Something isn't working label Jul 27, 2023
@JanHomann JanHomann changed the title df.ta.cdl_z() gives inconsistent high and low values df.ta.cdl_z() gives impossible high and low values Jul 27, 2023
@twopirllc twopirllc removed their assignment Aug 10, 2023
@twopirllc twopirllc added the help wanted Extra attention is needed label Sep 11, 2023
@twopirllc
Copy link
Owner

@JanHomann

I see. 🤔

One way to achieve this would be to first compute the rolling z-score of the closing prices ...

Since we have the whole candle (ohlc), what about using hl2 or oc2 or ohlc4 instead of close? Wouldn't one of those mean values be better? Thoughts?

Kind Regards,
KJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants