-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent IRC disconnects #1255
Comments
I've been chatting in Discord about this myself. Though it may not be the exact same issue. My bridge just seems to stop relaying messages coming from IRC to other outputs. I did notice in some testing that matterbridge was not responding to ping requests from IRC. I updated the client as well as ZNC (where I connect my matterbridge client) to and it looking at the debug logs I see the requests coming in and out. Anyway, there might be something there where matterbridge does not reconnect as quickly as its supposed to. Devs will ask you to upload your config and any logs since you did not provide any so if you can gather that to help narrow the issue down that would be great. |
@jajabro1 There are logs in OPs message, they are collapsed inside spoilers. Click on the lines starting with the triangle to expand it. |
@Lucki can you put |
The debug level should already be more verbose, it's set as 2 cause I tried to get more info as with 1 but it seems there's only one level. Does it have to be specifically 1? |
yes, needs specifically be 1 |
Heres my own experience with this. I put twitch in debug level 1 and heres an example of what happens.
This is a jump in time from 9:53 to 10:45 when there should have been hundreds of messages in between them |
I had the same problem with my setup but the problem has gone away after going back to v1.18.2 |
Today matterbridge had another disconnect. |
Another one with less noise this time
|
A rather short disconnect
|
Confirmed also got this issue on v1.18.3, having to revert as it's unusable |
Is this something to do with melody and the websocket for ping messages? |
I'm getting this using older matterbridge versions now which is leading me to believe that freenode have changed the way they deal with pings or are being regularly DDOS'd? |
Something seems to have happend at Freenode. I can't connect at all anymore, getting this error: |
@TheHolyRoger I was also thinking something like that, @Lucki what do you use the api for ? do you actually use it? |
Yes, we're using the api to push some news every now and then. I've activated the api a few days after setting up matterbridge and we've already observed disconnects before. Will temporarily deactivate it to be sure. |
Not something I can reproduce with our without api enabled. If you know how to build your own binary, please build from master. Now you'll need to set |
@42wim combined with this patch #1261 and increasing my VM resources, I've stopped this from happening. I think my ping responses were possibly being queued too long and the overall load of my VM was suddenly much higher without any changes. The above patch has helped lower my overall load by ignoring tengo-blanked/empty messages. But I still think freenode servers might also be under heavy load as this is a new issue with old versions. Matterbridge was using far less resources previously... |
@TheHolyRoger i'm running current master and my bridge only uses 35MB ram, what resources usage do you have? |
@42wim it's not the memory usage, it's the CPU load |
@TheHolyRoger I'm not sure why the CPU load could be that high (because it shouldn't), how many bridges and how many messages/second do you think you do? |
@42wim I've downloaded your build artifact, with debug level 2: The bridge lost connection on 23:03:13 and reconnected at 23:20:02 as reported by other clients. |
Ok, your log shows clearly that matterbridge sends a ping at 23:01:13 and at 23:02:13 and receives no response and thus tries to reconnect. This seems a freenode issue (or a bad connection to freenode?) Is your CPU maxed out on that system? may be starvation.
@lrstanley do you have maybe any idea? |
I can't imagine either of these two are the problem. We've used https://github.com/qaisjp/go-discord-irc right before switching to matterbridge and it was running without such problems.
There's barely any load on the 16 cores, not sure how to exactly get that information but
We're currently located at https://www.strato.de/ so that shouldn't (?) be an issue. |
Seems like go-discord-irc only pings every 4 minutes when no traffic or 15 minutes when traffic |
@Lucki I've just added a |
Thanks, will give it a shot. |
My particular issue still hasn't resurfaced with the changes I've mentioned... My IRC disconnects were occurring during times of very high traffic when some messages were delayed/out of order 10-30seconds+ The error was timed out waiting for a ping though. As mentioned here #1261 (comment) my messages per second are sometimes far less than 1 per second, sometimes 5-10+ (that may even be a big underestimation on my part). |
Unfortunately we've observed two disconnects tonight, even with the longer ping delay of 4minutes. |
@Lucki See https://github.com/42wim/matterbridge/actions/runs/324854685 for a build with yet more debug code that I modified in the girc library. It'll print now directly the RAW events before processing and some more processing stats. If this doesn't show anything more useful I'm out of options. |
Thank you, hopefully there's something interesting in here: Bridge disconnects at 11:17 and reconnects at 11:57 I'm honestly really confused by now. Not only looses the bridge connection which shouldn't happen in the first place it's also not reconnecting in 30 seconds as it claims in the log. And shortly before reconnecting which took 40 minutes to get to it dumps a bunch of messages and even a PONG is in there. |
After some debugging, the issue is probably not with irc
This shows that handling a privmsg took 43minutes, and as long as the privmsg isn't handled (meaning sent to the other bridges), no new messages can be handled, as it seems that the girc library handles all the messages the same way, so the PONG from freenode is also not handled. Which gives the timeout and disconnects irc etc .. So this means that matterbridge is sending a message to a bridge which takes forever (43minutes). This message is send from irc to the gateway, but isn't handled by the gateway because it's still processing the previous message
The previous message is coming from discord and also isn't handled by the gateway
The message before this is coming from irc
And is being stalled on discord, between receiving it on discord and actually sending it takes 43 minutes.
The only command between You can find devbuilds with this change here: https://github.com/42wim/matterbridge/actions/runs/326278222 |
Finally another disconnect:
Not sure why we're getting rate limited. There's not that much going on… |
Ok, so my hunch was correct. Maybe some other apps are using the same webhook? Not yet sure how I can handle this issue generally in matterbridge when we get stalled. |
Matterbridge is the only app using the webhook. The other mentioned bridge used the same webhook before but it isn't it use anymore. |
FWIW, girc intentionally blocks for standard callbacks, even for internally managed things like PING's. This is because a user of the library may want to respond to a command or server interaction before anything else does. E.g. STARTTLS upgrades cannot have any commands sent to the server between the initialization of the upgrade, and encryption of the connection and confirmation that the upgrade completed. I believe https://pkg.go.dev/github.com/lrstanley/girc#Caller.AddBg will work for matterbridges usecase, iirc. Though, if the handler is hanging, you will still have an issue of goroutine leaks if the thing running inside of the handler doesn't eventually finish. |
Is there any way we can figure out why we're getting suddenly out of the blue rate limited by discord? |
Possibly related qaisjp/go-discord-irc#57 |
@Lucki Can you remove the |
I changed the token in case it was somehow leaked but I already observed a disconnect with the new one. |
You will also need to wait for your rate limit to expire. It's per guild, and not per app. You can check when it's done by opening Server Settings -> Integrations -> Webhooks -> New Webhook, and seeing if the New Webhook button says "Internal Server Error" or not. |
I was able to create a new Webhook. The last rate limit was 6 hours ago so I'll do the same as before now, waiting a few days and try to catch disconnects :) |
According to a person in the Discord API chat (https://discordapp.com/channels/81384788765712384/381887113391505410/776912419351167008) there is a limit of 10 webhooks per channel. There does not seem to be an unreasonable limit for webhooks, server-wide. I managed to create 40 webhooks, 10 webhooks in 4 channels. I think this is new, at least in the past couple of years, as I remember there being a low limit in the past. This means that I think we can just have a single webhook per channel, instead of editing webhooks. The immediate workaround is to manually set up webhooks for each of your channels. That should make it work fine. Just provide It's the weekend, so maybe I'll get around to finally implementing #764 on Saturday. This should make it so that you don't need to manually create webhooks for each channel, and the bot will add or delete them as necessary. (This should also eliminate 1 HTTP request per message sent, as we no longer have to edit webhooks when a message is sent.) |
At least for me it seems to be working again with Freenode. Must have been an temporary issue on their side. |
We haven't observed any disconnects so far. |
I'm seeing similar issues, unrelated to IRC. The bridge seems to go unresponsive for a length of about 10 minutes at times. I'm using the API to bridge with MatterBukkit to a Minecraft server (the freezes incidentally causes the server to crash, will report to the plugin maintainer) Configmatterbridge.toml
LogsThe logs dont seem to report anything
|
The next time it happens, please try what is written here: qaisjp/go-discord-irc#57 (comment) |
See #1255 and qaisjp/go-discord-irc#57 Webhook edits gets ratelimited which cause other problems with matterbridge. Disabling for now.
See #1255 and qaisjp/go-discord-irc#57 Webhook edits gets ratelimited which cause other problems with matterbridge. Disabling for now.
See #1255 and qaisjp/go-discord-irc#57 Webhook edits gets ratelimited which cause other problems with matterbridge. Disabling for now.
I'm going to close this,the issue has been traced to the discord webhook edit ratelimits. |
Describe the bug
The bridge is connected to one channel on Freenode and when there's stuff going on in the channel we can frequently see the bridge disconnecting.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The bridge stays connected and reliably bridges the messages to the other configured
inouts
.Screenshots/debug logs
If applicable, add screenshots to help explain your problem. Use logs from running `matterbridge -debug` if possible.
Environment (please complete the following information):
version: 1.18.3 8b26e42a
Additional context
Please add your configuration file (be sure to exclude or anonymize private data (tokens/passwords))
The text was updated successfully, but these errors were encountered: