Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

redacted #4565

Closed
ghost opened this issue Feb 5, 2019 · 4 comments
Closed

redacted #4565

ghost opened this issue Feb 5, 2019 · 4 comments
Labels
z-privacy-sprint (Deprecated Label)

Comments

@ghost
Copy link

ghost commented Feb 5, 2019

redacted

@ara4n
Copy link
Member

ara4n commented Oct 10, 2019

As you can see from the timeline here this issue is very much on our radar, although we haven't fed back on the points raised here which we've fixed (oops).

All joins and leaves in every room are stored, these entries consist of user_id, access_token, device_id, ip, user_agent, last_seen, timestamp.

We no longer store access_token, device_id, user_agent, last_seen for users (unless that user's session is still active), as of #6098.

It's inevitable that we track the user_id and timestamp for when users join/leave rooms in order for the room history to actually function. MSC1228 will help us obfuscate the user_id however and is coming shortly.

Logs contain vast amount of information.

We have always gone to great lengths to avoid logging any sensitive data (e.g. message contents, secrets, key data etc) in logs. However, log lines do include user IDs and room IDs required to trace problems. Synapse doesn't run in a log minimisation configuration by default because it's still not stable enough to run unattended by itself, flying blind. We need the logs to help people out when things break. As soon as we hit a sufficient level of stability we'll change the default log level for sure (and we are headed in that direction).

Remove anything that isn't absolutely necessary from them and either implement a user-friendly mechanism (or documentation) to manage them, purge them automatically after a short period of time (fe. 7 days) or don't store them at all.

Synapse doesn't dictate how you store your logs or what retention scheme you apply. Each package of Synapse does it differently (systemd; python logging; docker logs etc), and it's up to the sysadmin to specify the log rotation & retention policy. They can also switch the log level if they want to WARN, which hides all PII.

Other things like redacted and deleted events, accounts, sent files.

Redacted/deleted events now get pruned after N days as of #5934. Deleting files referenced by redacted events is harder, but we're working on it.

@richvdh richvdh removed the phase:2 label Oct 1, 2020
@ghost ghost changed the title Metadata resistance redacted Jul 15, 2022
@ghost ghost closed this as completed Jul 15, 2022
@3nprob
Copy link
Contributor

3nprob commented Jul 15, 2022

@ghost Why closed?Logs still seem relevant

@ghost
Copy link

ghost commented Dec 23, 2022

Hello
Could somebody reopen the issue, maybe @ara4n ?, sorry for the ping in advance, but I still believe this issue is relevant today.
On the other hand, is this issue a meta issue tracking it on every component (like the matrix spec, synapse, element...) or just the Synapse part of it? I'm asking because I don't think we have a tracking list for this, and since this is a complex issue, maybe we should. I can make a list if you want to and post it here.

@3nprob
Copy link
Contributor

3nprob commented Mar 17, 2023

@NebulaOnion I think you can feel free to reopen this as a new issue (rather than yak-shaving it in a thread here).

FWIW if you want to reuse, penultimate version of this issue:

Currently synapse (and AFAIK the whole Matrix ecosystem) doesn't attempt to minimize metadata gathering in any way. This is one of it's biggest issues in terms of security and privacy. This makes Matrix to not be a sensible option for people who care about these values and they have to choose between privacy/security and decentralization/modern FOSS protocol and I think the latter values are significantly less important. In next few weeks Matrix should get to the state where there's bandwidth available to make these basic things right and only then work on things of less importance like new features, app rewrites and dendrite. I think it's a good strategy to first make the base robust and only then move further.

Incomplete list of unnecessary data gathered by synapse:

- Database stores unnecessary information. All joins and leaves in every room are stored, these entries consist of user_id, access_token, device_id, ip, user_agent, last_seen, timestamp. There's most likely more. These should be truncated to only contain information that is truly necessary and shouldn't be stored longer than necessary.
- Logs contain vast amount of information. Remove anything that isn't absolutely necessary from them and either implement a user-friendly mechanism (or documentation) to manage them, purge them automatically after a short period of time (fe. 7 days) or don't store them at all. Logs in production releases of synapse shouldn't contain debugging information, but only information required for security reasons, fe. audit after a breach and with guidance in documentation on how to secure this data up while minimizing metadata retention.
- Other things like redacted and deleted events, accounts, sent files.

I didn't investigate this thoroughly and there's likely more, if you know of anything else, don't forget to share in comments.

Since synapse requires other services for operation like reverse proxy, coturn and postgres (i'm not sure if python or anything else logs anything), this should also be dealt with. Either by removing these dependencies or by crafting a good documentation together with tools that will enable even a person without an infosec and sysadmin background to be able to set it up easily, properly and fast using only that documentation to learn. This is particularly important as Matrix aims to have a well balanced ecosystem of smaller servers avoiding the common problem of federation.

Users should be sufficiently and visibly informed in the documentation of anything that is stored and about possible options to modify this behavior, fe. log removal and how should it be done.

Like Arathorn mentioned, parts of that are no longer relevant.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-privacy-sprint (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants