Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy performance drops after setting up the integration #199

Closed
collateral87 opened this issue May 18, 2024 · 12 comments
Closed

Heavy performance drops after setting up the integration #199

collateral87 opened this issue May 18, 2024 · 12 comments
Assignees

Comments

@collateral87
Copy link

Version of the custom_component

0.6.5 (and tried main)

Describe the bug

Directly after I set up the integration, Home Assistant experienced significant performance drops. I restarted, but the restart was no longer possible (the core did not start anymore) and I had to force shutdown the machine.
Unfortunately, I don't have any logs because of this.
After removing the integration, everything ran normally again with the usual performance.

I have 8 BT proxies (ESP) and 1 BT adapter (ASUS). None of them are assigned to areas.

Is this a known issue or has anyone had similar experiences?

@agittins
Copy link
Owner

I'm really sorry to hear that!

There have been some changes in recent HA versions (around 2024.5 I believe) that appear to have made it less stable when integrations do non-threadsafe things, leading to (effectively) lock-ups. The changes were noted as adding stability but I've only seen it cause more frequent problems - which leads to fixes ultimately, but a lot of user frustration along the way!

Do you know what version of main you tried? I have been pushing a few changes up of late, so it's possible the one you grabbed might have been before a fix. Anyway, I have just pushed another potential fix, and bundled everything up into a release, so if you're able to see if v0.6.6 works for you that would be awesome.

I know it's a big ask, but if v0.6.6 is still a problem, would you be able to do some testing for me? The problem is that I have been unable to replicate some of these issues on my systems, which makes finding a fix so much harder.

From https://community.home-assistant.io/t/2024-5-tracking-down-instability-issues-caused-by-integrations/724441
if you could add this to your configuration.yaml and see if it gives you any helpful output:

homeassistant:
  debug: true

You might need to be able to SSH into your box to be able to get the logs, so I understand if this is a bit much to ask, especially if your system is actively running stuff in your home. I really hope I can get these things sorted out quickly, it's just tricky when I can't get my own systems to show the error. Thanks for your patience, though, and I really appreciate you taking the time to report your experience!

@agittins agittins added the moreinfo More information required to progress further label May 24, 2024
@agittins agittins self-assigned this May 24, 2024
@srempfer
Copy link

I had a similar experience after installing the integration (v0.6.6) 3 weeks ago:

  • Installed the integration
  • Worked some minutes and then slowed down the system (Raspberry Pi 4 with HA 2024.5.5)
  • System got unresponsive so that a restart was not possible
  • Unplugged power to restart the system

I checked the logs and found a warning from the integration that there is a device (Shelly with enabled bluetooth proxy) which has no assigned area. Unfortunately I haven't the exact log anymore.

Since I added an area to the device the problem is gone.
I'm not sure if the restart or the assignment of the area fixed the problem.

In the near future I'll integrate some more Shelly devices which gives me the possibility to do some tests and maybe the problem could be reproduced.

Thanks for the great project, exactly what I looked for.

@jack3308
Copy link

jack3308 commented Jul 1, 2024

Running into similar here. Running HAOS 2024.6.3 on Pi 4B 4gb w/ ssd for storage. No addons to speak of, fair few integrations, and usually it zips right along, but something with this integration seems to be slowing things down. As soon as I disable it things are quick as normal, but with it enabled everything is noticeably slower, particularly booting up.

Now, I'm gonna caveat all of that and say that this is a fantastic tool and absolutely killer integration, so please don't think I'm displeased or anything. Just wanting to add that it seems to be happening to me too, and willing to provide what support I can in terms of logs and whatnot if you want them to have a look at.

@agittins
Copy link
Owner

agittins commented Jul 1, 2024

Oh, interesting!

So it seems there is a general performance issue rather than the complete wedging / spinning issues that came with the async changes, then.

@jack3308 do you have any proxies not assigned to areas? (there will be regular log entries about it if that's the case)

The slow boot-up is a bit perplexing. A gradual decrease in performance over time could point to a memory leak or possibly extra tasks being scheduled that shouldn't be, but slow bootup is odd, because unlike many integrations we are only working with local data so no remote APIs to contact etc, and startup is usually lightning fast as a result.

If you can enable debug logging for a minute or so, then disable it and send the resulting file that might shed some light. Note that the file will contain MAC addresses and possibly IPs etc, so redact anything you're not comfortable with sharing, or email directly to me at ash@ajg.net.au if you'd rather not post publicly.

Also if you could let me know how many devices and entities you have enabled (just the numbers, as shown in the screenshot below) that will be helpful, too.
image

@jack3308
Copy link

jack3308 commented Jul 2, 2024

At the moment, only tracking 2 devices (my phone and my partners) using iBeacon from HASS companion app on android. I worked my way through the beacons I have and I don't think there are any not assigned to an area.

Thank you so much your work on this, it's truly the best option by miles for room presence and the methodology and aspirations are miles ahead of other options. Not that you need to hear it from me, I'm sure, but it's seriously night and day how much more clever and useful this is, so thanks for all of your effort. I'll email the logs directly to you, I'm in a bit over my head when it comes to debugging an integration so if I've done something wrong please let me know.

@formatBCE
Copy link

formatBCE commented Jul 5, 2024

Uhh, it worked OK with 1 tracker and 1 device to track. I hoped to replace my clunky custom solution with this one. (Though i remember that ESPHome BLE tracker is buggy crap, that's why i came out with my own one at the first place.)
But no miracle - after updating 5 other trackers with BLE tracker, HA just went unresponsive, and after i deleted that BLE code and updated ESPs, HA rebooted and started working again. Before this happened, i saw that my only device tracker was going to unavailable and back to home once per second.
Deleting. It should've been the best attempt on presence, with device trackers inbuilt - but unfortunately it's not. :)

@agittins
Copy link
Owner

agittins commented Jul 5, 2024

@formatBCE I am having trouble understanding what you mean.

You mention "BLE tracker", are you referring to the esp32_ble_tracker config in your esphome devices? If so, and turning that on and off causes HA to lock up, can you confirm that doing so behaves differently when Bermuda is installed vs when it's not?

If your esphome device was also dropping in and out (going unavailable and back every second) does this also happen when Bermuda isn't running but the esphome is configured as a proxy?

A full copy of your esphome yaml might be helpful to diagnose this. Also, are you in a fairly "dense" bluetooth environment, like an appartment building or other area where there are "many" bluetooth devices around? I am wondering if perhaps the volume of BT traffic might be triggering an issue, either on the esphome side or perhaps in Bermuda.

I've had my system previously running with a cache of tens of thousands of addresses (most were transient or rotated random macs that were no longer present, of course) and it didn't cause any issues other than a slightly longer processing loop (each second, when in debug mode, HA will tell you how long Bermuda spent processing its data). It got up to about 0.1 seconds or so at worst. More recent releases have record pruning so you wouldn't see anywhere near that number of addresses any more, but if many active devices are around it could possibly increase the processing load.

I'd want to confirm that the esphome setup with bluetooth_proxy turned on and Bermuda turned off was stable first, though - that could be an indicator of something else going on.

@formatBCE
Copy link

@agittins sorry, I won't be helpful here. I reverted everything back, and can't afford reproducing that behaviour again... Will use my old stuff.
Yes, I referred to esp32_ble_tracker. Couple years ago I tried to play with ESPHome BLE tracking, but had only bad experience with inbuilt mechanism... So I can suspect that it takes at least some part in that hanging.

@agittins
Copy link
Owner

I am going to close this issue since I think the original problem @collateral87 experienced was probably some race conditions that were exposed after changes in recent versions of HA. As far as I know these are bedded down since v0.6.7.

@srempfer and @jack3308 your issues seem to be performance related rather than the complete wedging that the async issues caused. Since you're both on Raspberry Pi's it's possible that memory usage or log/database churn might be contributing factors (Jack I note you're using an SSD which should be pretty solid, not sure Sean if you are on SD card or SSD). SD cards in particular are prone to I/O issues which Bermuda is likely to make worse, but hard to say for sure.

I have just written up a wiki page on how to manage database IO if that's a possible cause: https://github.com/agittins/bermuda/wiki/Logs,-Recorder-and-Database-size

@jack3308 did you end up emailing me any logs? I haven't found anything but it might be in spam (I get a lot of that, given I keep putting my email address in public places!). If you have sent it let me know something I can search on to check my mailbox, eg if jack is in your address that would be enough for me to find it, I think. If you have the time, could you open a new issue for your performance problem? I think it's probably quite different from what the OP had, so would rather track it separately.

@formatBCE I'm not sure what hardware you are running on, but your experience is definitely not typical. If you do decide to give it another go please feel free to raise another issue if you run into trouble again. My home production server is tracking 22 devices over 418 entities (I have a lot of the extra distance sensors enabled) and its totally stable, with cpu usage around 10% to 20% typically, most of which is from other integrations. Database output is significant though, and I run a separate postgres db with over 200GB of history.

The once-per-second flipping from unavailable is weird, and might indicate the integration was crashing and reloading, or just that one of the timeout settings was set too low. Anyway, if you do decide to give it another crack I'm happy to help track down the issue.

For now though, I'll close this issue and invite y'all to pop open a new one if comes up again.

@agittins agittins removed the moreinfo More information required to progress further label Jul 10, 2024
@jack3308
Copy link

@agittins email should start with "jk@". Sorry, custom domain so this happens sometimes.

@srempfer
Copy link

I running my HA on an SSD.

Last weekend I integrated some more Shelly devices which gave me the possibility to do some tests but I couldn't reproduce the problem. Neither with HA 2024.5.5 nor HA 2024.7.1

@agittins
Copy link
Owner

For anyone following this issue, it seems from @jack3308's logs that Bermuda is having a bad interaction with another integration, which seems to be generating tens of thousands of bluetooth device entries.

If anyone is experiencing performance issues and has some time to try a thing or two, it would be great if you could:

  • update to the main version via HACS
  • Check how many scanners and devices Bermuda reports on the first page of its "Configure" dialog
  • If it's "a lot", then select the "Download Diagnostics" option in the Bermuda menu and share the file on Performance Issues: #234

Obviously you won't be able to do this if your system completely wedges up, so I totally understand that not everyone will be able (or willing!) to give this a shot.

Device and scanner counts (only in main, currently):
image

Download Diagnostics (only in main, currently):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants