Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration stops working after several hours #3

Open
darootler opened this issue Dec 2, 2019 · 103 comments
Open

Integration stops working after several hours #3

darootler opened this issue Dec 2, 2019 · 103 comments
Labels
bug Something isn't working

Comments

@darootler
Copy link

Hi!

I am using Home Assistant 0.102.3 and Home Connect 0.2.3 (installed via HACS). For me this integration just stops updating the entities after several hours. If i restart HA the integration is working again for several hours.

I have set up an automation wich notifies me if the dishwasher finished the program, this automation just stops working because the sensor never updates:

image

Anything i am doing wrong here? Any suggestions?

Regards
Richard

@DavidMStraub
Copy link
Owner

Is there anyting in the logs? If now, could you enable debug logging and check? Maybe the SSE listener drops connection and doesn't reconnect (which it should in principle) or maybe token refresh failed (which is now handled internally by Home Assistant).

When the sensor stops updating, can you try if you can still do things like starting programs or switching on/off? I'm asking because these trigger direct calls to the API rather than listening to a stream, so if they don't work either, it indicates a permission problem or the API being down.

@darootler darootler changed the title Integrations stops working after several hours Integration stops working after several hours Dec 2, 2019
@darootler
Copy link
Author

Is there anyting in the logs? If now, could you enable debug logging and check? Maybe the SSE listener drops connection and doesn't reconnect (which it should in principle) or maybe token refresh failed (which is now handled internally by Home Assistant).

When the sensor stops updating, can you try if you can still do things like starting programs or switching on/off? I'm asking because these trigger direct calls to the API rather than listening to a stream, so if they don't work either, it indicates a permission problem or the API being down.

I will check that tomorrow. Enabling debug logging on my raspberry will lock the SD card down and make the whole system nearly unusable, maybe we can determine the problem without that.

Regards
Richard

@DavidMStraub
Copy link
Owner

OK thanks in advance!
(In principle I think one can enable debug logging for individual components, but I never actually tried.)

@darootler
Copy link
Author

darootler commented Dec 2, 2019

OK thanks in advance!
(In principle I think one can enable debug logging for individual components, but I never actually tried.)

Yes, thats possible. If we need it i will try that.

Regards
Richard

@darootler
Copy link
Author

I just checked the entities and they are "out of sync" now:

image

image

Is there a way to help troubleshoot the problem? There are no warnings or errors according to home connect in the logs.

Regards
Richard

@DavidMStraub DavidMStraub added the bug Something isn't working label Dec 4, 2019
@DavidMStraub
Copy link
Owner

Did you try if you can still switch power or start programs after they go out of sync? If no, does that lead to something in the log? If yes, I think it's defintely due to SSE dropping connection, that would be good to know.

@darootler
Copy link
Author

Did you try if you can still switch power or start programs after they go out of sync? If no, does that lead to something in the log? If yes, I think it's defintely due to SSE dropping connection, that would be good to know.

I am able to start the dishwasher but that does not update the sensors. What is SSE?

Regards
Richard

@DavidMStraub
Copy link
Owner

Thanks! That's indeed useful.
SSE (server sent events) is a stream that allows the API to push events rather than requiring polling. Home Assistant opens a separate thread for each HC device and listens to events on these streams, but it seems the connection dies and is not restarted.

@darootler
Copy link
Author

I see. Are you able to reproduce that behavior? Is there anything i can do?

Regards
Richard

@DavidMStraub
Copy link
Owner

Yes, I think I have also run into this problem now. I will have to look at the SSE implementation in the homeconnect Python library and see if I can find a bug there. I'll update you when there is something new to test.

@lancer73
Copy link

Now I am completely green with respect to this code (been using HA for two weeks now), but if nothing is going over the line, the connection will get killed by a router doing NAT.

@DavidMStraub
Copy link
Owner

Oops. As stated in the commit message, this is an attempted fix. I'll now update the HA component with this and would be grateful about testing.

@darootler
Copy link
Author

Will this potential fix be part of a release or do i have to update the custom component manually?

Regards
Richard

@DavidMStraub
Copy link
Owner

I will publish a release later today.

@ghost
Copy link

ghost commented Jan 9, 2020

Still same issue when used in HA.

Br
Reidar

@lancer73
Copy link

I had the impression that before te latest changes, with HA 103.5 it was better. Everything was still working after 24H.

@DavidMStraub
Copy link
Owner

I don't see how it could have gotten worse. So what are you seeing now? Any errors in the log? Do all your appliance stop updating at the same time? (Each of them is listened to in a separate thread, so I would expect them not to fail simultaneously.)

@darootler
Copy link
Author

I can confirm that this problem still persists.

Regards
Richard

@DirkSDT
Copy link

DirkSDT commented Feb 3, 2020

With 104.3 HA stopps updating after 20 - 24 hours
The Log shows this:
homeassistant | File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 72, in __next__ homeassistant | File "/usr/local/lib/python3.7/site-packages/homeconnect/api.py", line 217, in _listen homeassistant | File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 81, in __next__ homeassistant | File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 51, in _connect

@DavidMStraub
Copy link
Owner

Thanks! Does the traceback continue? I would be curious which exception it raises.

@DirkSDT
Copy link

DirkSDT commented Feb 3, 2020

�[36mhomeassistant |�[0m Exception in thread Thread-3:
�[36mhomeassistant |�[0m Traceback (most recent call last):
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 72, in next
�[36mhomeassistant |�[0m next_chunk = next(self.resp_iterator)
�[36mhomeassistant |�[0m StopIteration
�[36mhomeassistant |�[0m
�[36mhomeassistant |�[0m During handling of the above exception, another exception occurred:
�[36mhomeassistant |�[0m
�[36mhomeassistant |�[0m Traceback (most recent call last):
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
�[36mhomeassistant |�[0m self.run()
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/threading.py", line 870, in run
�[36mhomeassistant |�[0m self._target(*self._args, **self._kwargs)
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/homeconnect/api.py", line 217, in _listen
�[36mhomeassistant |�[0m for event in sse:
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 81, in next
�[36mhomeassistant |�[0m self._connect()
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/homeconnect/sseclient.py", line 51, in _connect
�[36mhomeassistant |�[0m self.resp = requester.get(self.url, stream=True, **self.requests_kwargs)
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 546, in get
�[36mhomeassistant |�[0m return self.request('GET', url, **kwargs)
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/requests_oauthlib/oauth2_session.py", line 478, in request
�[36mhomeassistant |�[0m url, http_method=method, body=data, headers=headers
�[36mhomeassistant |�[0m File "/usr/local/lib/python3.7/site-packages/oauthlib/oauth2/rfc6749/clients/base.py", line 198, in add_token
�[36mhomeassistant |�[0m raise TokenExpiredError()
�[36mhomeassistant |�[0m oauthlib.oauth2.rfc6749.errors.TokenExpiredError: (token_expired)

@DavidMStraub
Copy link
Owner

Oooh. That might be why I wasn't able to understand this. It's not actually a problem with the SSE stream, it's a problem with auth! Thanks a lot for this hint!

Now the question is why the token is not refreshed. I thought that HA internally takes care of this but maybe I misunderstood. I will ask the core devs.

@DavidMStraub
Copy link
Owner

I am using the actual master branch (0512cb0) and for me the integration is working. Could you please change the following error to a warning like we already discussed above?

I am monitoring my HA instance for errors and i am getting a few messages a day about this error. Also maybe a new release with the actual master should be released?

Finally done.

Closing 🎉

@darootler
Copy link
Author

@DavidMStraub

Thank you.

I am using HACS to maintain Home Connect and i am only able to choose version 0.2.4, am i doing something wrong?

74A518F0-8F1A-47A5-844C-00608DF2281B

Regards
Richard

@DavidMStraub
Copy link
Owner

Hi,

I haven't made any change to the homeassistant-homeconnect repository yet, I will make a new release there later or tomorrow.

@darootler
Copy link
Author

Hi,

I haven't made any change to the homeassistant-homeconnect repository yet, I will make a new release there later or tomorrow.

Hi David,

did you manage to make a new release by chance?

Regards
Richard

@DavidMStraub
Copy link
Owner

No, but you can use the master branch in the meantime.

@freesty82
Copy link

Hi,
i still have this problem, i'm using the official HA integration. I need to use the beta version?
Thanks

@DavidMStraub
Copy link
Owner

No, they are the same. Enable debug logging and monitor the log.

@FlavorFx
Copy link

FlavorFx commented Sep 14, 2020

I still have this issue. I prepared a log with enabled debug information:
debug file
First I restarted HA. The home connect entities in HA are still unavailable. After a while I turned on my washing machine. You can see it in the log. HA still shows unavailable entities. After 1,5 hours I started by washing machine once again. This time there are no log entries any more and still no HA entities availabilities.

Pls reopen this issue and help to find the root course why this integrations does not update the current status and why the integrations stop working after a while.

@DavidMStraub DavidMStraub reopened this Sep 14, 2020
@DavidMStraub
Copy link
Owner

I'm happy to reopen the issue, but honestly I don't know what to do about it.

@FlavorFx
Copy link

FlavorFx commented Sep 16, 2020

Hi David,
I setup a Docker dev environment, pulled the last sources and run HA with debug informations. Please see my two debug fines. debug file 1 and debug file 2
One shows a lost connection after 1,5hours. The other did not show an update of the integration anyhow. This is what I noticed on my UI.

  1. HA started
  2. UI Integrations shows "unavailable"
  3. Turn on ma washing machine
  4. Selected different washing program
  5. UI Integrations shows still "unavailable"
  6. Turned off the washing machine
  7. HA terminated

There seems to be a connection established but no update to the corresponding integration. After a while the integration loose it's connection. Do you have any idea how to improve it? Pls help.

@FlavorFx
Copy link

Hi David, It seems that the connections get lost after 1,5 hours and it failed to reconnect. Any update or idea on that? Is there a way to support you?

@DavidMStraub
Copy link
Owner

Hi,

thanks for your debug logs, but I still don't know what the cause could be.

The second log doesn't show any hint.

In the first log, there is an exception in the SSE stream, which I also observed to happen occasionally, which is why I implemented the automatic reconnection. That's aso why this exception is actually not logged as an error, but only as a warning in HA. The log shows that the reconnection is triggered, but I don't know if it succeeded. It's the second to last entry in the log. The last entry, homeassistant.components.websocket_api.http.connection.140059420551200, I think is unrelated to the SSE stream as it is raised in the main thread. No idea what is causing that.

Don't get me wrong, I don't doubt that there is a problem - I also lose the connection to my appliances. In my current HA run, it occured after 48 hours. But since I don't know what's causing it, I don't know how to debug.

My suspicion is actually that it is a problem with the token refresh, which is handled by HA internally and is unrelated to the homeconnect library.

My suggestion is to open an issue in the HA core library, unless you can reproduce the issue with this Python library directly.

@FlavorFx
Copy link

FlavorFx commented Oct 1, 2020

Hi David,
thanks for taking a closer look. It's good to know that you are also seeing a disconnection. Will the reconnect establish a new connection? if so, that is the difference. As you can see from the log, homeconnect tries to reconnect for 90 minutes without success before I finished HA. I have to restart HA to get a new connection.
Further I tried your example program of homeconnect. Again, the connection is terminated after a certain time.
Is there something wrong with the _connect (reconnect)? If I understand your code correctly there is a timeout of 3 sec. Would the sample program log helpful for you?

@DavidMStraub
Copy link
Owner

What I just noticed is that I can actually still switch on my oven, even 3 days after the integration stopped updating. This proves that it is a problem with the SSE after all and not with the tokens, as the tokens still work (they need to be refreshed at least once every 24 h and HA takes care of that).

Can you please check whether that also works for you?

@duftstenen
Copy link

I can confirm this scenario as well. I can turn on the coffee machine, but the integration has stopped reporting

@FlavorFx
Copy link

FlavorFx commented Oct 3, 2020

Well, I have to turn on my washer manually and after that it reports during this time (after I restarted HA). But already after the second run the report update stops.

@DavidMStraub
Copy link
Owner

OK so I observed something interesting. This time I had an internet outage and the updates just stopped, without any messages in the log. So my theory is that the SSE just drops from time to time when there is a network bottleneck or a hiccup of the API server. The question is just how to detect those disconnects ...

@DavidMStraub
Copy link
Owner

Oh wow I think I finally found a way to solve this. By getting inspiration from the competition: the API server actually sends a KEEP-ALIVE event once a minute. I hadn't used that at all so far. So in principle, all we need to do is listen for it and disconnect-reconnect when we haven't seen one for more than a minute. I'm confident this will work!

@DavidMStraub
Copy link
Owner

I just released a new version (and updated the dependency in the custom component repo) that should be able to reconnect in case of a stale connection.

However, it does not yet address what happens if the connection is closed and the reconnection fails (e.g. if the server is temporarily unavailable or the internet connection temporarily interrupted).

Still, it would be great if someone could test this - it may well be that this solves most problems.

@FlavorFx
Copy link

FlavorFx commented Oct 6, 2020

I think you are on the right track. My washing machine is in the basement and there I have a powerline connection. May be there is an unreliable connection. I will test your fix.

@FlavorFx
Copy link

FlavorFx commented Oct 8, 2020

Unfortunately, I still have this issue. I prepared another log. I merged your changes in tip revision, restarted HA, and started my washer. HC updated accordantly. After a while I noticed a read time out. And after another time a lost the connection and HA does not update the integration.

@DavidMStraub
Copy link
Owner

OK thanks. At 11:26 the new timeout fired and the reconnection was successful. At 11:47 it fired again and then at 11:52 it failed to connect. This is basically what I mentioned above, a failing reconnection because the server cannot be reached is still not handled. But I don't now what's causing it.

@FlavorFx
Copy link

Hi David, you can close it. It doesn't occurred on my live system. The error occurred on my docker dev system due to the system suspect mode.

@firefart
Copy link

just tested this on a fresh hass installation and it still looses the connection after some time and it’s fixed on a reload. is there a way to enable debug logging only for this integration to help identify the issue?

@DavidMStraub
Copy link
Owner

Hi, issues with the Home Assistant integration should be reported at https://github.com/home-assistant/core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants