Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 sync does not delete excluded files #4923

Open
stevenkaras opened this issue Feb 5, 2020 · 5 comments
Open

s3 sync does not delete excluded files #4923

stevenkaras opened this issue Feb 5, 2020 · 5 comments
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue s3filters s3sync

Comments

@stevenkaras
Copy link
Contributor

the --delete flag on the s3 sync command does exactly what it says in the manual:

Files that exist in the destination but not in the source are deleted during sync.

Meaning that it will not delete files in the destination if they exist in the source even if they were excluded. To reproduce:

aws s3 mb s3://sync-delete-exclude
mkdir /tmp/sync-delete-exclude
touch /tmp/sync-delete-exclude/{1,2,3}
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude
# we expect this to delete the file 3
aws s3 sync s3://sync-delete-exclude --exclude=3 --delete

Our use case is to sync only the last week/month/year of files out of a s3 bucket using exclude and include filters, but files that are excluded are not deleted, meaning we must invoke another step afterwards to delete excluded files. This seems far off from the intent of the command which is "make the destination look like the source after filtering"

@klaytaybai klaytaybai self-assigned this Feb 6, 2020
@klaytaybai
Copy link

I think we should treat this as a feature request but make some adjustments to avoid breaking changes. In my opinion, the current documentation is valid, if somewhat confusing. It specifies that the exclude operation operates at the command level, so one could reasonably argue that flags such as delete shouldn't apply to excluded files.

--exclude (string) Exclude all files or objects from the command that matches the specified pattern.

I think a good alternative might be to add a --delete-excluded flag that meets your use case. Thoughts? Without an additional option, I don't think we can make the change without breaking others.
e.g.

rm /tmp/sync-delete-exclude/2
# we expect this to delete the file 2 and now 3
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude --exclude=3 --delete --delete-excluded

@klaytaybai klaytaybai added feature-request A feature should be added or improved. s3sync labels Feb 6, 2020
@stevenkaras
Copy link
Contributor Author

stevenkaras commented Feb 9, 2020

Given the close parallel to rsync, --delete-excluded is the best option. I would suggest borrowing the copy from their documentation, as it makes the interaction of the --delete and --exclude flags explicit which is currently only implied.

@zappallot
Copy link

We have a slightly similar problem. Our command looks like this:
aws s3 sync s3://<bucket> /<folder> --delete --exclude "*" --include "*.py"
This is running in Kubernetes in a sidecar container where the main container adds *.pyc files in the same folder the are then deleted by the sync command.
We'd want to a --delete-only-included flag to only delete files in the target folder that are specified in the include of the sync command. (Hope that makes sense)

@ticteam
Copy link

ticteam commented Dec 13, 2021

hi,
what's up with this request ?
I also need the feature, as I want that only a specific folder is used with "--delete" so other folders than the given aws s3 sync /<folder1> s3://<bucket>/<folder1> --delete are not involved

s3://<bucket>/<folder1> synced with --delete flag
s3://<bucket>/<folder2> not touched
s3://<bucket>/<folder3> not touched

@atz
Copy link

atz commented Dec 20, 2021

I agree with the need for a new explicit option. We rely on the not-deleting version of --exclude for distributed processes that aggregate data back to the same directory not to obliterate each other's output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue s3filters s3sync
Projects
None yet
Development

No branches or pull requests

6 participants