Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hilbert Transform Trendline, closes issue #411 #759

Merged
merged 4 commits into from
Mar 13, 2024

Conversation

aligheshlaghi97
Copy link

@aligheshlaghi97 aligheshlaghi97 commented Feb 2, 2024

I implemented the Hilbert Transform Instantaneous Trendline in pandas-ta by studying the mathematical principles of the Z-transform and reverse-engineering TALIB's C implementation. Through rigorous testing, I ensured that the results produced by ht_trendline match those generated by TALIB's implementation, thus enhancing the technical analysis capabilities of the pandas-ta library and closing issue #411 .

In addition, I further enhanced its performance by leveraging the Numba library, thereby significantly boosting the execution speed. This experience allowed me to deepen my understanding of Numba's capabilities and its integration within the project structure. Furthermore, I refined my Git proficiency by creating a new branch locally and generating the pull request towards the development branch. By following best practices and maintaining clean commit histories, I ensured seamless integration with minimal conflicts, streamlining the merge process for @twopirllc .

The code for test is inspired by @rafalsza provided in #411 :

from talib import HT_TRENDLINE

# Define the ticker symbol and time period
ticker = "BTC-USD"
start = "2020-01-01"
end_date = "2022-01-01"
interval = "1d"

# Retrieve the stock data using yfinance
data = yf.download(ticker, start, interval=interval)

# Extract the OHLC data
ohlc_data = data.loc[:, ["Open", "High", "Low", "Close"]]

# Calculate TALIB ht_trendline
talib_itrend = HT_TRENDLINE(ohlc_data['Close'].values)

# Calculate ht_trendline with pandas-ta
pandas_ta_itrend = ht_trendline(ohlc_data['Close'], talib=False)

# Create the figure and add Candlestick and Line traces
fig = go.Figure()

fig.add_trace(
    go.Candlestick(
        x=data.index,
        open=ohlc_data["Open"],
        high=ohlc_data["High"],
        low=ohlc_data["Low"],
        close=ohlc_data["Close"],
        name="OHLC",
    )
)
fig.add_trace(go.Scatter(x=data.index, y=talib_itrend, mode="lines", name="TALIB-ITrend"))

fig.add_trace(go.Scatter(x=data.index, y=pandas_ta_itrend['ht_trendline'].values, mode="lines", name="pandas-ta-ITrend"))

# Set the layout
fig.update_layout(
    title="ht_trendline, TALib vs Pandas-ta implementations",
    xaxis_rangeslider_visible=False,
    xaxis_title="Date",
    yaxis_title="Price",
    legend=dict(x=0.9, y=1, bgcolor="rgba(255, 255, 255, 0.5)", bordercolor="rgba(0, 0, 0, 0.5)"),
    showlegend=True,
#     yaxis=dict(type="log"),  # Set the y-axis scale to logarithmic
)

# Show the plot
fig.show()

Here you can see the results of the the two which are very much the same:
Screenshot from 2024-02-02 23-28-11

I also found this piece of code very useful for testing pandas-ta functions under development without the necessity of local installation or modification of the current library version.

import sys
import importlib
module_name = "pandas-ta"

sys.path.insert(1, '/path/to/pandas-ta')
from pandas_ta import ht_trendline

@twopirllc
Copy link
Owner

Hello @aligheshlaghi97,

On first glance it looks good. I'll take a look this week. Also all the other Hilbert Transform indicators need to be converted as well, so this is a great starting point. Thanks for helping with this. 😎

KJ

@aligheshlaghi97
Copy link
Author

Hi @twopirllc

Thank you for taking the time to review the PR. I'm glad to hear that the initial feedback is positive. Could you please elaborate more on what you mean by converting all HT indicators? And also how much should I consider the failed workflows as they may not belong to HT indicator?

Bests,
Ali

@twopirllc
Copy link
Owner

@aligheshlaghi97

Could you please elaborate more on what you mean by converting all HT indicators?

Sure. There are a handful left to complete that begin with HT_. I want to get those completed as well as rewrite td_seq before making a new Pandas TA release.

And also how much should I consider the failed workflows as they may not belong to HT indicator?

I can write the tests for it. No worries. 😎

@aligheshlaghi97
Copy link
Author

Thank you for the additional information, @twopirllc . I'm eager to assist with completing the remaining HT indicators and rewriting td_seq for the upcoming Pandas TA release.

Could you please let me know if there are any specific notes, resources, or guidelines available for implementing the HT indicators and completing the td_seq rewrite?

Additionally, I wanted to mention that during the implementation of ht_trendline, I found myself delving into calculations related to ht_dcperiod (dominant cycle calculations). As such, I believe I can contribute to other aspects such as phase or sine calculations if needed. Please let me know how I can best support the completion of these indicators.

@twopirllc
Copy link
Owner

@aligheshlaghi97

I'm eager to assist with completing the remaining HT indicators and rewriting td_seq for the upcoming Pandas TA release.

I appreciate all the help. In no rush.

Could you please let me know if there are any specific notes, resources, or guidelines available for implementing the HT indicators and completing the td_seq rewrite?

The best source is TA Lib since that is default values that Pandas TA is expected to return. Otherwise it should return TV values. I'll dig up any other info I can and make a later comment.

Additionally, I wanted to mention that during the implementation of ht_trendline, I found myself delving into calculations related to ht_dcperiod (dominant cycle calculations).

I figured as much. I wonder if they can all be done in one indicator? I would need to dive into the code to be sure though.

@aligheshlaghi97
Copy link
Author

Hi @twopirllc,

Thank you for your explanations and clarifications. Based on the indicators in TALib prefixed with 'HT_', I believe it's entirely feasible to consolidate all the HT indicators into one. They share many common elements, particularly in the calculation of dcperiod. For instance, the variable period is calculated consistently across nearly all the other indicators, as evident in the source code here.

I wanted to inquire about the level of expectation regarding the similarity between our results and TALib's. Is it anticipated that our results match TALib's precisely, or is there room for minor discrepancies?

@twopirllc
Copy link
Owner

@aligheshlaghi97

I wanted to inquire about the level of expectation regarding the similarity between our results and TALib's. Is it anticipated that our results match TALib's precisely, or is there room for minor discrepancies?

Yes. It has been acheiveable with most Pandas TA indicators to match those of TA Lib. Just get a close as you can and I will try and fix the differences. 😎

@twopirllc
Copy link
Owner

@aligheshlaghi97

Apologies for taking a bit to get to this.

Updated Development Environment

I am now using Python 3.11.7 versus 3.9.18 as well as required Python packages for Pandas TA:

numba==0.59.0
numpy==1.26.4
pandas==2.2.0
pandas-datareader==0.10.0
pyarrow==15.0.0
scipy==1.12.0
TA-Lib==0.4.28
yfinance==0.2.36

I have also cleaned up most Depreciation and Future Warnings messages due to the environment update.

HT_TRENDLINE

While in the process of incorporating your version of ht_trendline, I have that ht_trendline and TA Libs HT_TRENDLINE are not identical (not counting the prenans) like you have shown above. I am curious, are there any actual differences between your numpy version and TA Lib's starting from row/bar 63? Though they eventually converge to the same values. Can you tell me more about which version of Python and packages you are using?

My results for Python 3.9 and 3.11:

Screenshot 2024-02-19 at 3 42 24 PM Screenshot 2024-02-19 at 3 42 52 PM

Thanks
KJ

@aligheshlaghi97
Copy link
Author

Thank you for the update, @twopirllc

I appreciate your efforts in updating the development environment and addressing the Deprecation and Future Warnings messages.

In regards to the versions, I'd like to confirm that both Python 3.9 and 3.11, along with their respective packages, yield identical results for ht_trendline. I've been working with Python version 3.10.12 and the following package versions:

numba==0.58.0
pandas==1.4.1
scipy==1.9.0
TA-Lib==0.4.28
yfinance==0.2.18

Regarding the differences observed between my code and TALib's, it's possible that the variance stems from the initial point for calculating ht_trendline, resulting in non-identical outputs at the beginning of the array. However, as you rightly mentioned, they converge and maintain good coverage thereafter.
I'll delve deeper into the code and if I find a solution to make them identical, I will provide an update accordingly.

Best regards,
Ali Gheshlaghi

@twopirllc
Copy link
Owner

@aligheshlaghi97

No worries. I just wanted to make sure that we were both getting similar results for bars between bars 63 and 167 followed by convergence starting from bar 168 on.

I'll delve deeper into the code and if I find a solution to make them identical, I will provide an update accordingly.

No rush. I'll be diving into it as well while also tackling some other issues.

Appreciate the help. 😎

@aligheshlaghi97
Copy link
Author

Thanks @twopirllc for your update.
After testing the code I shared earlier, I obtained the following result:

Screenshot from 2024-02-21 18-19-07

This indicates that the convergence point of my data is at the 116th candle for BTCUSDT-1D data, spanning from 2020 to the present. I am now even more confident that the issue lies with the initial data point, and I will work on resolving it.

@aligheshlaghi97
Copy link
Author

@twopirllc

I've also found something else worth sharing. If you decrease the index at which the calculations start from 50 to 5, the convergence point will occur earlier. Please make this change in my implementation, on line 30:

Screenshot from 2024-02-21 19-16-05

While this adjustment seems logical, we can't set it to zero, as Elher indicates in his article, and we can also find some clues to this in TALib's implementation. However, by making this minor change, the convergence point decreases from the 116th candle to the 95th for BTCUSDT-1D data, spanning from 2020 to the present. Moreover, MSE of the result decreases a lot.

@twopirllc twopirllc merged commit 410224c into twopirllc:development Mar 13, 2024
1 check failed
@twopirllc
Copy link
Owner

Hey @aligheshlaghi97

It's up and ready to go on the development version. Let me know how it works for you. Thanks again for the submission. 😎

@aligheshlaghi97
Copy link
Author

Hi @twopirllc,

Thank you for merging the pull request and implementing the changes. It's great to see progress in resolving the differences between our implementation and TALib's.

Upon reviewing the new code, I noticed a longer time taken to converge between our implementation and TALib's, as depicted in the provided screenshots:

Screenshot from 2024-03-14 05-16-41

Screenshot from 2024-03-14 05-16-53

However, I also noticed that the mean difference between our implementation and TALib's is very low:

Screenshot from 2024-03-14 05-18-26

This suggests that while our implementation is more accurate in terms of mean error, it may take longer to converge compared to TALib's.

I was also curious about your thoughts on the idea of mixing all ht_ indicators together. Do you think it's feasible now, or do you have any concerns about this approach?

Bests

@twopirllc
Copy link
Owner

@aligheshlaghi97

Yes, that is true it does take a little longer to converge and has generally a smaller mean error. The truth is neither of us were able to exactly replicate TA Libs calculation and that is ok. Sometimes there will be differences in calculation no matter what. A 99% correlation is not bad considering the differences in execution and the original intent from Ehler's EasyLanguage code from which I ultimately chose to default to. However I believe it may be possible to get them exact with more time. The core groundwork has been laid and anyone is welcome to improve it. If you would like your version included as well, let me know, we can do that. No biggie.

I was also curious about your thoughts on the idea of mixing all ht_ indicators together. Do you think it's feasible now, or do you have any concerns about this approach?

In nearly all ta_HT_*.c files they have similar code up to:

smooth_period[i] = 0.33 * period[i] + 0.67 * smooth_period[i - 1]

So yeah, one or two can be combined easily. The others may need some juggling to do so.

Cheers

@aligheshlaghi97
Copy link
Author

Hi @twopirllc

Thank you for your insights.

I appreciate the clarification. I'm glad to hear that some ht_ indicators can be combined easily. Could you please suggest which indicator would be most aligned with your opinion to work on next?

Bests

@twopirllc
Copy link
Owner

@aligheshlaghi97

Could you please suggest which indicator would be most aligned with your opinion to work on next?

DC Period and that one is trivial as returning with trendline in the numba function. I'll dig in again after I finish this other task.

Apologies for the delayed response

@aligheshlaghi97
Copy link
Author

Hi @twopirllc

Hope you're doing well. I'm still excited about the potential to integrate other APIs for the ht_ indicators.
Did you have any time to work on it? And if so, I'm curious to know would it be better to implement shared sections (e.g. calculation of dcperiod) in a separate module or to implement all of them first and then separate the shared sections of code in the future?
I'd also like to know what I need to do to be listed as a contributor of pandas-ta. Please let me know if there are any specific requirements.

Best regards,
Ali Gheshlaghi

@twopirllc
Copy link
Owner

Hey @aligheshlaghi97,

I'd also like to know what I need to do to be listed as a contributor of pandas-ta. Please let me know if there are any specific requirements.

There are no specific requirements and was unintentional. I apologize for it somehow not going through, I must have done something incorrectly through the merge process such that it did not include you in the contributors. 🤦🏼‍♂️ I will try to rectify it if you make a trivial PR so Github recognizes you as a contributor. Irregardless, I remember all the main contributors, like you, to the project. 😎

Did you have any time to work on it?

No. I had to deal with some other umm... "pressing" issue and it's post-effects. But it is still on my mind cause I want to get it done.

And if so, I'm curious to know would it be better to implement shared sections (e.g. calculation of dcperiod) in a separate module or to implement all of them first and then separate the shared sections of code in the future?

As we discussed earlier, I think we should try to roll as much as possible into ht_trendline's core loop to minimize overall loop calculations when processing a bunch of indicators. If not, we can try coupling the most similar ht_ indicators together.

I also have to do this with linreg after including @Rossco8 's speed improvements in #776.

KJ

@aligheshlaghi97
Copy link
Author

Hi @twopirllc

Sorry for my delayed response; I was working on improving the performance of the ht_trendline indicator.

As we discussed earlier, I think we should try to roll as much as possible into ht_trendline's core loop to minimize overall loop calculations when processing a bunch of indicators. If not, we can try coupling the most similar ht_ indicators together.

In my trials, it seems it's not possible to further optimize our current ht_trendline. I also experimented with some code modifications, but they resulted in incorrect outputs. On the other hand, we are closely following the c code implementation of TA_Lib. As a conclusion, I agree with coupling the shared part of ht_ indicators together (maybe by abstracting).

I will try to rectify it if you make a trivial PR so Github recognizes you as a contributor

I believe this issue arises because we haven't had any releases for a while (or any merges with main branch), so the contribution list isn't updated. However I'll definitely make new PRs.

Irregardless, I remember all the main contributors, like you, to the project. 😎

This is an honor for me🎉

@twopirllc
Copy link
Owner

Hey @aligheshlaghi97,

Sorry for my delayed response; I was working on improving the performance of the ht_trendline indicator.

No worries. Even I have long response delays as well, especially since this repo is nowhere close for me to support it full time. 😑

In my trials, it seems it's not possible to further optimize our current ht_trendline. I also experimented with some code modifications, but they resulted in incorrect outputs. On the other hand, we are closely following the c code implementation of TA_Lib.

Thanks for the optimization confirmation.

As a conclusion, I agree with coupling the shared part of ht_ indicators together (maybe by abstracting).

I agree. Which abstraction technique did you have in mind? I do not want to use classes due to their potential execution overhead.

This is an honor for me🎉

Your contributions and collaboration has been most gracious. You don't happen to have a Stripe (or similar) account by chance?

KJ

@aligheshlaghi97
Copy link
Author

Hi @twopirllc
Thanks for your response,

especially since this repo is nowhere close for me to support it full time. 😑

I understand, and I am very curious to know whether you're working on your public projects e.g. AlphaVantageAPI (which sounds interesting to me) or mainly on private ones.

I agree. Which abstraction technique did you have in mind?

I think abstract class could be useful for us. We can also use python protocols to remain decoupled and only use type checks.

I do not want to use classes due to their potential execution overhead.

This sounds cool, but there is debates about the performance impact, and we might need to conduct our own survey. Please look at this link about class overhead vs function

Btw, if you don't want to use class's methods, there will be absolutely no difference and we can define one function to calculate the shared parts. And then use that function inside our new APIs.

Shared Parts of ht_ Indicators

Our desired ht_ indicators from TA_Lib are: ht_dcperiod, ht_dcphase, ht_phasor, ht_sine, ht_trendline, ht_trendmode. The shared section of code for these six indicators is calculation of period (in my opinion) which happens in all of these indicators c implementation. Please take a look at this link which shows how period is calculated in those six indicators. Or you can simply search for this line of code in TA_Lib: period = (0.2*period) + (0.8 * tempReal);

You don't happen to have a Stripe (or similar) account by chance?

Unfortunately I don't have and cannot open one due to restrictions of my country, so I use cryptocurrency for my daily payments.

@twopirllc
Copy link
Owner

Hey @aligheshlaghi97

I understand, and I am very curious to know whether you're working on your public projects e.g. AlphaVantageAPI (which sounds interesting to me) or mainly on private ones.

Mainly private ones. I haven't touched AlphaVantageAPI in a long time, too busy with Pandas TA. 😑

I think abstract class could be useful for us. We can also use python protocols to remain decoupled and only use type checks.

Yeah an ABC could be useful. Haven't heard of protocols, they seem like they can help. 🤷🏼‍♂️ If you want to try that approach, that would be cool.

This sounds cool, but there is debates about the performance impact, and we might need to conduct our own survey. Please look at this link about class overhead vs function

Yeah I remember reading something like that. That the performance difference is negligible.

Btw, if you don't want to use class's methods, there will be absolutely no difference and we can define one function to calculate the shared parts. And then use that function inside our new APIs.

We will find a path, time willingly.

Our desired ht_ indicators from TA_Lib are: ht_dcperiod, ht_dcphase, ht_phasor, ht_sine, ht_trendline, ht_trendmode. The shared section of code for these six indicators is calculation of period (in my opinion) which happens in all of these indicators c implementation. Please take a look at this link which shows how period is calculated in those six indicators. Or you can simply search for this line of code in TA_Lib: period = (0.2*period) + (0.8 * tempReal);

I will try and expand nb_ht_trendline to include the remaining ht_ indicators.

Yeah, period = (0.2*period) + (0.8 * tempReal) and re, im are also calculated similarily. They are basically ema's of length 9.

Unfortunately I don't have and cannot open one due to restrictions of my country, so I use cryptocurrency for my daily payments.

Damn! I checked stripe and they don't support crypto there either. 😑 And I almost never use crypto. I'll look more into it when I can.

@aligheshlaghi97
Copy link
Author

Hi @twopirllc
Thanks for your response.

Mainly private ones. I haven't touched AlphaVantageAPI in a long time, too busy with Pandas TA. 😑

Totally understandable and hopefully I'd be able to help you with some Pandas-TA's loads.

Yeah an ABC could be useful. Haven't heard of protocols, they seem like they can help. 🤷🏼‍♂️ If you want to try that approach, that would be cool.

Sure, but I think our coding signature in Pandas-TA is to keep everything in function level as you said before, and using protocol requires class anyways. So I'm wondering function is better than class in our case.

I will try and expand nb_ht_trendline to include the remaining ht_ indicators.

Thank you so much for this, as I'm really eager to learn how you'll do it.

My Suggestion

I think you should make another nb_ function named nb_ht_phasor, because the shared section between all ht_ indicators is inside ht_phasor functionality. The period calculation section we talked about it earlier, period = (0.2*period) + (0.8 * tempReal), is the last thing done inside ht_phasor.

Inside of nb_ht_phasor function, you should include line 34 to line 67 of our nb_ht_trendline implementation.

This way we can simply use nb_ht_phasor inside all other ht_ indicator implementations and be DRY.

And if you want to implement sth like my suggestion, please let me do that, so you might have more free time for your other tasks.

Damn! I checked stripe and they don't support crypto there either. 😑 And I almost never use crypto. I'll look more into it when I can.

Don't worry about it, your attention and your trial is so much valuable for me.

Best Regards,
Ali Gheshlaghi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants