Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhatsApp support #711

Merged
merged 25 commits into from
Feb 21, 2019
Merged

WhatsApp support #711

merged 25 commits into from
Feb 21, 2019

Conversation

KrzysztofMadejski
Copy link
Contributor

Alpha version. Discussion is on #475

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some fixes!

P.S. share your ideas, feedbacks or issues with us at https://github.com/fixmie/feedback (this message will be removed after the beta stage).

@42wim
Copy link
Owner

42wim commented Feb 12, 2019

First of all, great job! Even more if this is your first go program 👍
Also extra points for editing the README and the matterbridge.toml.sample 🥇

Regarding the PR, some general comments before I look at the code:

  • You can ignore all the fixmie messages for now, exported functions should have comments, see https://golang.org/doc/effective_go.html#commentary (so it would be nice if those could be done, but no blocker)
  • The travis failure can not be ignored, the reason it fails now is because you didn't update the vendor directory. You can do this using go mod vendor
  • About the code, can you restructure it like the slack code, put the handlers in handlers.go, helper functions in helpers.go and the main bridge functions in ``whatsapp.go`

We're are also using golangci-lint for linting the code, I suggest you install this and run it from the matterbridge directory. It'll show you what needs to be improved.

If some of this output isn't clear, don't hesitate to ask for help here or on the chat.

@KrzysztofMadejski
Copy link
Contributor Author

On it, thank you!

My main concern right now is a UI one: a channel on Slack is shown as a unique ID string of the WhatsApp channel, which doesn't look great. I have to use unique IDs for groups, because WhatsApp allows for non-unique group names. I'm wondering if there is any possibility that in the config file there will be group ID, but Slack (and other bridges) would show some kind of group label? Did you have such problem for any of the other bridges/chats?

chat-slack

@KrzysztofMadejski
Copy link
Contributor Author

The travis failure can not be ignored, the reason it fails now is because you didn't update the vendor directory. You can do this using go mod vendor

Shouldn't downloading vendors be part of the build process? Keeping vendor files that can be accessed remotely in this repo sounds like a bad practice to me.

@KrzysztofMadejski
Copy link
Contributor Author

Question: When func (b *Bwhatsapp) Disconnect() is being called?

In WhatsApp I can logout, but I'm not sure if this is what we want.

@KrzysztofMadejski
Copy link
Contributor Author

About the code, can you restructure it like the slack code, put the handlers in handlers.go, helper functions in helpers.go and the main bridge functions in ``whatsapp.go`

I separated also a senders.go file to put there all the handling coming from the relay into WhatsApp. It will grow substantially. Let me know if you would prefer it to be named differently.

@42wim
Copy link
Owner

42wim commented Feb 13, 2019

Some quick comments, later some more :-)

  • vendor is added because

    • reproducible builds and backup (upstream can go poof anytime)
    • we need to build against older versions of go which don't support the go modules but do support vendor
  • Disconnect() isn't used for now, it's a placeholder, so no worries about that.

  • senders.go (or whatever you choose) is fine, but please keep Send() in the main whatsapp.go

@KrzysztofMadejski
Copy link
Contributor Author

All checks have passed

@KrzysztofMadejski
Copy link
Contributor Author

But please hold some time with merging that in. I see that it is quite easy for telephone numbers to be exposed via relay. There might be no way around it, but at least I'd like to research that a little bit more and document for users to be aware of. #unintentedconsequences

@42wim
Copy link
Owner

42wim commented Feb 13, 2019

My main concern right now is a UI one: a channel on Slack is shown as a unique ID string of the WhatsApp channel, which doesn't look great. I have to use unique IDs for groups, because WhatsApp allows for non-unique group names.

I don't understand this: "a channel on Slack is shown".
You mean "a channel on slack is shown on whatsapp" ? because the attached screenshot is from slack? Need some more examples/screenshot to help :-)

I see that it is quite easy for telephone numbers to be exposed via relay

You mean by running with -debug ? or because numbers can't get mapped to users?

But please hold some time with merging that in.

Ok, please also remove the large commented part in the Send() in whatsapp.go and the slack references in New(). You also probably won't need the lru cache.
There's also a race condition in usersAvatars and users map, but we can fix this after the initial merge.

@KrzysztofMadejski
Copy link
Contributor Author

KrzysztofMadejski commented Feb 14, 2019

Sorry, I should have had elaborated more. One photo is worth a thousand words:

whatsapp

1. Channel Label

My main concern right now is a UI one: a channel on Slack is shown as a unique ID string of the WhatsApp channel, which doesn't look great. I have to use unique IDs for groups, because WhatsApp allows for non-unique group names.

I'm using a RemoteNickFormat="[{LABEL} #{CHANNEL}] @{NICK}" which results in [CfP #48111222333-1549986983@g.us] @Virus APP [10:45 AM] being shown as sender.

I don't know if WhatsApp is the first case of long unique channel id or did you have this issue before? What I'd love to have is an ability to set a channel label that would be used here instead of channel id when set.

    [[gateway.inout]]
    account="whatsapp.cfall"
    channel="48111222333-1549986983@g.us"
    channelLabel = "Code for Pakistan"

That would result in above RemoteNickFormat to be formatted as [CfP #Code for Pakistan] @Virus APP [10:45 AM]. But maybe you know a smart hack to handle it without introducing a new setting.

2. Unintended consequences

I see that it is quite easy for telephone numbers to be exposed via relay

WhatsApp primarily identify users by telephone numbers. Altough internally there are independent unique ids being used, still I saw that telephone numbers pop out here and then.

To bring the picture again:

whatsapp

Exposing numbers happens in two places (that I found so far):

  • group/channel ID contains the telephone number of the owner
  • mentioning other people on WhatsApp puts their telephone numbers as a mention "code".

While for WhatsApp using numbers is natural, relaying messages to many other bridges is an issue in my opinion. Ppl might not want their phone numbers to be exposed to "strangers" on public chats. How was it solved in Telegram?

Issue 1 could be solved by above channelLabel mechanism and issue 2 I can handle easily in the code [update: fixed already]. Regardless I think that somewhere in the Wiki we should put a notice that it might happen that numbers might be shared involuntarily to other bridges.

That made me also think about the EU's GDPR regulations and user consent before you share any of their personal data, but that's a bigger topic.


Thinking further into unintended consequences.. WhatsApp imposed a maximum limit of members in the group to stop disinformation from spreading. Disinfo bubbles spreading via WhatsApp groups and forwarding resulted in killings in India and Philippines. This bridge overcomes some of those limits. How do we address it? I might start another issue for #unintentedconsequences. Maybe do a review along https://ethicalos.org/ guidelines.

3. Code

Ok, please also remove the large commented part in the Send() in whatsapp.go and the slack references in New(). You also probably won't need the lru cache.

Cleaned the code.

There's also a race condition in usersAvatars and users map, but we can fix this after the initial merge.

Where is the race? I write to it in one place and time, and then only read from them.

But for the future I will need to update them as ppl join the channels, so suggestion how to guard races would be welcomed.

Provide meaningful label outside and don't expose user numbers.
@patcon
Copy link
Contributor

patcon commented Feb 14, 2019

Perhaps related to "1. channel label" aspect: #614

A robust templating engine would allow people to set a high-level default RemoteNickFormat that took special-cases into account, without solving it in core:

[{% label %}{% #if protocol != "whatsapp" %} #{% channel %}{% /if %}] @{% nick %}

Note: The above is a made-up pseudo language.

@42wim
Copy link
Owner

42wim commented Feb 15, 2019

1. Channel Label

My main concern right now is a UI one: a channel on Slack is shown as a unique ID string of the WhatsApp channel, which doesn't look great. I have to use unique IDs for groups, because WhatsApp allows for non-unique group names.

I'm using a RemoteNickFormat="[{LABEL} #{CHANNEL}] @{NICK}" which results in [CfP #48111222333-1549986983@g.us] @Virus APP [10:45 AM] being shown as sender.

I don't know if WhatsApp is the first case of long unique channel id or did you have this issue before? What I'd love to have is an ability to set a channel label that would be used here instead of channel id when set.

Well, I think using {CHANNEL} in RemoteNickFormat is an edge case. I don't see it used in issues raised here (nor do I use it myself). Channels are mostly bridged 1:1 so you know what channel you've bridged and it can put it in the topic for example, no need to put it in every message ?
(Telegram also has those weird numeric ID channel names)

But I agree if you do use it, a channel-label feature would be nice (but it doesn't exist).

But maybe you know a smart hack to handle it without introducing a new setting.

A possible hack to achieve this is to create a new whatsapp bridge per channel, this way you can use the existing label option.

2. Unintended consequences

I see that it is quite easy for telephone numbers to be exposed via relay

WhatsApp primarily identify users by telephone numbers. Altough internally there are independent unique ids being used, still I saw that telephone numbers pop out here and then.

To bring the picture again:

whatsapp

Exposing numbers happens in two places (that I found so far):

  • group/channel ID contains the telephone number of the owner
  • mentioning other people on WhatsApp puts their telephone numbers as a mention "code".

While for WhatsApp using numbers is natural, relaying messages to many other bridges is an issue in my opinion. Ppl might not want their phone numbers to be exposed to "strangers" on public chats. How was it solved in Telegram?

Afaik telegram always returns a name and if the name is empty (when a user hasn't set a public name) we return "unknownUser"

// if we really didn't find a username, set it to unknown
if rmsg.Username == "" {
rmsg.Username = unknownUser
}

I don't use telegram myself, so I haven't seen this in action.

Issue 1 could be solved by above channelLabel mechanism and issue 2 I can handle easily in the code [update: fixed already]. Regardless I think that somewhere in the Wiki we should put a notice that it might happen that numbers might be shared involuntarily to other bridges.

Sounds good.

That made me also think about the EU's GDPR regulations and user consent before you share any of their personal data, but that's a bigger topic.

Thinking further into unintended consequences.. WhatsApp imposed a maximum limit of members in the group to stop disinformation from spreading. Disinfo bubbles spreading via WhatsApp groups and forwarding resulted in killings in India and Philippines. This bridge overcomes some of those limits. How do we address it? I might start another issue for #unintentedconsequences. Maybe do a review along https://ethicalos.org/ guidelines.

You raise some good points, but I have no idea how we could control this. I think this point should also be raised at the library, maybe messages have a "forwarding" count somewhere. Another option here is to detect if the message is a forwarded message and drop it (but anyway who wants to overrule this can just modify the code).

3. Code

Ok, please also remove the large commented part in the Send() in whatsapp.go and the slack references in New(). You also probably won't need the lru cache.

Cleaned the code.

There's also a race condition in usersAvatars and users map, but we can fix this after the initial merge.

Where is the race? I write to it in one place and time, and then only read from them.

The one in users map that I thought isn't there, you're indeed not writing again to the users map.
The usersAvatars is because you're filling the map in a go function that could still be running when receive a message. (in L83 of the diff)

You can run go build -race which will show you the detected races as you run matterbridge with the whatsapp bridge.

But for the future I will need to update them as ppl join the channels, so suggestion how to guard races would be welcomed.

Look at the slack bridge, it adds a mutex

usersMutex sync.RWMutex

Which then get a R(ead)Lock when data is read, and a Lock when you also write data to your map

func (b *Bslack) getUser(id string) *slack.User {
b.usersMutex.RLock()
user, ok := b.users[id]
b.usersMutex.RUnlock()
if ok {
return user
}
b.populateUser(id)
b.usersMutex.RLock()
defer b.usersMutex.RUnlock()
return b.users[id]
}

So you best use those functions to modify/read the map instead of directly accessing the map.
More info on: https://gobyexample.com/mutexes and https://medium.com/golangspec/sync-rwmutex-ca6c6c3208a0

Regarding the PR itself some small nitpicks left before it can be merged:

  • please remove the .gitignore file
  • also the GetBoolOrDefault() function is not necessary, especially because you're using false as default (and the default of bool is always false if not initialized).

@KrzysztofMadejski
Copy link
Contributor Author

Channel Label

I'm using a RemoteNickFormat="[{LABEL} #{CHANNEL}] @{NICK}" which results in [CfP #48111222333-1549986983@g.us] @Virus APP [10:45 AM] being shown as sender.

Well, I think using {CHANNEL} in RemoteNickFormat is an edge case. I don't see it used in issues raised here (nor do I use it myself). Channels are mostly bridged 1:1 so you know what channel you've bridged and it can put it in the topic for example, no need to put it in every message ?

I use Matterbridge to link different communities together. I'd like them to know where does message originated from so they can head to another chat to get into 1-on-1 direct messages or other channels that are not relayed. It's true though that in most cases the channel names are very similar. I might drop it for simplicity and to solve above problem temporarily.

But I agree if you do use it, a channel-label feature would be nice (but it doesn't exist).

I've separated it into an issue, will work on it in the free time -> #725

Code

The usersAvatars is because you're filling the map in a go function that could still be running when receive a message. (in L83 of the diff)

What you said is true, but I don't see there any race conditions. The write function fills in the map at its own pace. It is the only thread that does the writing. The message handler reads from the usersAvatars - it will either:

  1. return url for avatar -> cool we have it!
  2. return nil if this wasn't set (yet) -> cool, we don't have it, we will get it next time

The setting of the url is done at once in the map, atomically, so no need to guard writing there.

I understand that Slack might need mutexes because you have multiple writers. Each message being handled might result in modifying the users map. Here there is no such issue


  • please remove the .gitignore file

I added it to make it harder to accidentally add developer's matterbridge.toml file with tokens. That's a standard procedure to ignore "secrets". But I removed it according to your suggestion.


  • also the GetBoolOrDefault() function is not necessary, especially because you're using false as default (and the default of bool is always false if not initialized).

It is not neccessary, but it was making the code cleaner for me. Removed.

@42wim
Copy link
Owner

42wim commented Feb 20, 2019

Thanks for the changes.

For me the current code can be merged, let me know if it's ok for you too.
I'm going to release a 1.14.0 soon, would be nice if this could be included in that release.

What you said is true, but I don't see there any race conditions. The write function fills in the map at its own pace. It is the only thread that does the writing.

Made a simple example of the issue on https://play.golang.org/p/ow2ik2qoTNA
Build that code with go build -race and run it, it'll show a datarace as below in the comment.
(in your case the go thread will probably be finished before a textmessage is received)

@KrzysztofMadejski
Copy link
Contributor Author

It's good to go from my side. Thank you a lot for taking the time to peer-review it!

@Tjird
Copy link

Tjird commented Feb 21, 2019

Maybe handy to make a example how to save the session file with a volume (For Docker).

@KrzysztofMadejski
Copy link
Contributor Author

@Tjird I'm not an expert of Docker, could you propose a docker-compose.yml that could help?

I added a mention of the need to store the session file to Azure deploy's guide https://github.com/42wim/matterbridge/wiki/Deploy:-Azure

@Tjird
Copy link

Tjird commented Feb 21, 2019

@KrzysztofMadejski I'm making a PR myself for that small thing if this PR is merged.

@42wim 42wim merged commit 55e7906 into 42wim:master Feb 21, 2019
@42wim
Copy link
Owner

42wim commented Feb 21, 2019

Let's party! 👍
@KrzysztofMadejski thanks again!

@KrzysztofMadejski KrzysztofMadejski deleted the feature_475/whatsapp branch February 21, 2019 20:32
@42wim 42wim added this to the 1.14.0 milestone Feb 22, 2019
zeridon pushed a commit to zeridon/matterbridge that referenced this pull request Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants