-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
essmqtt stops updating values after 38 hours of runtime #7
Comments
As a quick and dirty fix I'll give it a try to start the service every 24 hours with:
|
I'm trying to reproduce this - with 38 hours until it crashes it sounds like it will be hard to debug. Could you check whether on your system the memory footprint of essmqtt grows -- e.g. compare memory use right after starting the service to memory use after a few hours? |
Alright, I took the automatic restart out again and restarted the service taking a screenshot of |
24 hours are over now and I have just checked the status of the service:
All of a sudden the system doesn't like
any more, but the communication keeps on working and the only change in |
Thank you for testing that @fu-zhou . These screenshots look like there can't be any significant memory leak involved. I'm trying to reproduce the issue with my ess already, but so far essmqtt is running smoothly (I started it on may 24th 19:40 pm, so it's already running more than 38 hours). But I used the default 10 seconds poll rate, I'll increase it to your two seconds and see whether that breaks it faster. I will know on thursday whether it crashes for me after 38 hours with your poll rate. Edit to add: I have a few ideas that I'm planning to try, but fixing the issue would be most convenient if I could reproduce the hanging process locally. |
with the automatic restart after 24 hours it seems to work so far... I'll keep an eye on it! |
@fu-zhou good to hear you have a workaround for the time being. I managed to reproduce the hanging process on my system and will now try to fix the problem. I managed to debug into the process but in crashed state I don't find obvious clues why it crashed. Fixing will take some time because it takes a while to crash it. Feel free to poke me every few weeks. |
Just updated to 0.1.10 more or less right before I reached 38 hours with 0.1.8 (1 day 12 hours). I meanwhile changed the poll timing to 2 sec (home) and 10 sec (common, divisor=5) as the common values go into the database only for statistics purposes while I try to do somewhat real time calculations and predictions with the home values. On top I got a strange error message from the PCS (once only a couple of days ago when running at 2 and 2 seconds):
Since the error pointed at communication I reduced the load (divisor=5) however I have no clue what SDSP stands for and sure enough the LG manual doesn't explain it either. |
I'm running with a two second interval more than 38 hours now and pyess still submits values. I'll keep checking it but the most recent submit might have fixed the issue. Unfortunately I also don't know what what SDSP is, the manual is indeed not helpful. I'll report if I observe the same issue. In theory my ess should be pretty occupied, essmqtt is polling it every 2 seconds and on top of that it gets polled by the graphite/carbon bridge every 10 seconds. So maybe that's enough to trigger the error message. |
I got a firmware update pushed on my PCS today, so I turned the timing back to 2 and 2 seconds in order to see if anything changed regarding the values in home and common. |
@fu-zhou Thank you for mentioning the firmware upgrade, there was one available for the HOME5 version I use as well. I applied it. Strangely for my HOME5 ess I have not observed any crashes of essmqtt since applying v0.1.10 so I will close this issue in a few days. |
Hi @gluap: your SW versions are totally different from the ones on the ESS 8, it seems that there is a major difference in the two systems, however pyess works on both and that is important! One of the differences seems to be that YOU can pick and choose the firmware while on my ESS 8 I have no options or choices, the firmware got pushed onto the system without notification and without documentation what changed and even worse: The firmware introduced even more unwanted behaviours than the sign of the active power. I opened a service case with LG asking them to tell me how to downgrade from the buggy firmware to the working one - no reply yet. |
**EDIT: I strongly discourage HOME5 users from upgrading to the newest version manually. After the upgrade my ESS worked fine for a few hours. Then it started reporting "Battery overvoltage" and entered a restart loop because of it. To fix it I tried upgrading to a different firmware version with assistance from the LG hotline, that version claimed no battery and/or AC was connected. To at least get back to a state where the ESS can operate without battery I tried applying the initial firmware update again. ** @fu-zhou Getting new firmware pushed forcefully is really bad style. But if it's any consolation: I can't pick and choose either- I found only one new version of the firmware available that could be applied. No older firmware version to go back to and until end of April no firmware updates at all. Also I found no changelog to judge whether the update will be worthwhile or not. The process of upgrading involves copying firmware files onto three USB sticks, sticking them in the usb port in the right order and hoping that you're not accidentally flashing the wrong firmware because there is virtually no documentation on the process. The lights flash differently from what is described in the upgrade manual during the process making me worry that I might have bricked the device while applying the update. So also the manual process is far from optimal. My main reason for applying the update was that my ESS kept forgetting that it is supposed to "economically" use the battery and almost always charged it full as fast as it could in the morning. Later around noon it would then throw away usable energy to match the 70% feed-in and 5kW AC converter limits instead of charging the battery with the excess energy as it is supposed to. This seems to be fixed in the current FW at first glance. It was working sporadically in the past as well though, so I have to observe it for some time to be sure. By the way if you want to avoid further forced upgrades you may be able to prevent them by disabling the data upload to enervu in the admin menu. I saw one manual claim that this would also disable automatic updates. (Caveat: I've no Idea whether that affects manufacturer warranty). |
To be honest: I eagerly waited for the firmware as it was supposed to fix a couple of unwanted behaviours, which it didn't, it introduced new ones. To be really sure regarding the internet connection, I typically block the PCS in my FritzBox. |
Just before I reached 38 hours the service was restarted. Do you know if there is a logfile stored somewhere which contains details what caused the restart? Do you recommend that I take out the restart option from the service in order to get details why the service fails from time to time? |
The log is printed out to the console, so it should end up in the systemd logs. Systemd logs from essmqtt can be accessed via Unfortunately I can't leave my ess running for 38 hours at the moment because it enters the restart loop as soon as the battery is full. At that point I have to disable the battery for it to at least feed the grid. I hope to fully drain the battery tonight in the hopes of the system being able to learn the charge curve again if it can observe a full battery cycle. If that doesn't fix it I'm opening a support call tomorrow. I am really annoyed by the buggy firmware. |
Man, their quality control sucks! Here's the relevant extract:
Can you read anything from that? This sequence shows up multiple times over the last days, I have attached an ASCII file generated with |
In your longer log I see two different errors - One is the one that you also pasted above, the other seems to indicate that some time around 0:47 or so the mqtt server was down. I have changed the error handling such that whenever mqtt is down the whole communication is re-initiated. It should now be able to deal with both errors. Nevertheless I'm not sure whether it wouldn't strictly be more correct to just exit with an error code when one remote party doesn't respond and leave it to systemd to decide whether or not communication should be re-initiated. |
The MQTT Server can be down for a couple of minutes once per day as a system backup is starting at 02:00 AM in the morning, but not 0:47, that was probably indeed a system fault or network error and the essmqtt service did the right thing: it restarted and the communication didn't stop while having the service running (the 38 hours behaviour you fixed). |
Small update: essmqtt didn't restart automatically yet, runtime is 1 day and 19 hours, however the communication was re-established after 36 hours and 50 minutes, message-log is attached. To prevent misunderstandings: I posted the log in case it shows valuable hints for you, not to build a work around ;-) |
@fu-zhou great, the log indicates that the automatic reconnect added with 0.1.11 works. The code is printing out a stacktrace to keep information on where a problem is happening, but the fix I added on sunday fixes the issue by starting over without crashing, so in this case the stacktrace and especieally the warning messages around it are a good sign: The broken connection detection works now. The current approach ist to automatically reastart on all connection errors that we can be caused by known by MQTT or ESS connection errors. When an unknown error type is encountered essmqtt will still exit and leave the mess to be dealt with by systemd. I will close this issue now, thank you again for your extensive testing! |
I'm still running essmqtt at a 2 seconds update cycle for both "common" and "home". 3rd day in a row essmqtt stops updating the values after 38 hours of runtime (1 day, 14 hours). The MQTT server keeps on running as far as I can tell and restarting the server doesn't fix the problem. The service needs to be restarted (
systemctl restart essmqtt.service
). Immediatley after restarting the values are being updated in the server.systemctl status essmqtt.service
doesn't deliver any hint regarding the issue.:The text was updated successfully, but these errors were encountered: