Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to create a ZIP file based on document IDs #131

Closed
reganwolfrom opened this issue May 30, 2023 · 5 comments
Closed

Add the ability to create a ZIP file based on document IDs #131

reganwolfrom opened this issue May 30, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@reganwolfrom
Copy link
Member

For now, this request would take a list of document IDs and start an asynchronous process to retrieve these documents and compress into a ZIP file. There would be no functionality at this time to include child documents automatically; if a document has child documents, they would need to be specifically requested using their document IDs.

Request Endpoint:
POST /compress
{
documentIds:['aaa-aaa-aaa-aaa', 'aaa-bbb-bbb-bbb']
}

This would trigger the process to create a ZIP. This could be a Lambda, or an AWS Fargate task, depending on what makes more sense. We may need to use SQS for this processing, and may even need a compression task id and even a status endpoint to determine status. Or it's possible that we could return an S3 object URL, and then compression could update the object from empty to finished ZIP file, but it's possible object could return while compression is ongoing, returning only some of the document selected to include.

Once the compression has completed (and the requestor can determine that completed status), we may want to have them access the signed URL, but alternatively, we may need better management on how many times the file can be accessed. It's possible to add an expiration time, or we could add a delete endpoint if we have an ID outside of the URL.

So if we need a "download id" for each compression event, we could use POST /downloads/compress and GET /downloads/{downloadId} to get status and/or create and retrieve an S3 signed URL, and DELETE /downloads/{downloadId} that could delete the ZIP file.

First step will be to determine if it's even possible to do without a "download id"; however, it may make more sense to just use a "download id" with the GET and DELETE endpoints, to enable more control by an application with this process.

@reganwolfrom reganwolfrom added the enhancement New feature or request label May 30, 2023
@reganwolfrom
Copy link
Member Author

@formkiqMike, can you take a look at this description and see if any info should be changed? Thanks!

@mfriesen
Copy link
Member

I think it should work like Google Drive where you select a list of document and then those document are zipped up into a temporary file and downloaded.

This probably could be piggy backed on the DocumentActionsProcessor and the DocumentActionsQueue.

The response would return a presigned url to a file in the Documents Bucket tempdata folder then a LifeCycle rule would need to be added to the Documents Bucket like the OCR bucket already has.

  LifecycleConfiguration:
    Rules:
      - Id: ExpiryRule
        Prefix: tempfiles/
        Status: Enabled
        ExpirationInDays: 1

@RyanBuhrWA
Copy link

FWIW, seconded! This would be a super handy feature for us.

@ytsipun
Copy link
Member

ytsipun commented Aug 6, 2023

#160

mfriesen added a commit that referenced this issue Aug 8, 2023
* feat: adds documents/compress request handler

* feat: adds OpenAPI spec for /documents/compress

* refactor: OAPI spec

* feat: extends the tests

* feat: adds retention policy to tempfiles in STAGING

* refactor: prettified the tests

* refactor: OAPI spec

* feat: DocumentCompressor initial

* refactor: used multipart upload only for large files

* refactor: handled IOException, provided multipart upload S3

* refactor: refactored

* refactor: resolved merge conflicts

* refactor: refactored after review

* refactor: refactored

* refactor: extended the tests

* refactor: refactored

* Update build.gradle

* Update api.yaml

* Squashed commit of the following:

commit 60eb79e
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Mon Jul 31 23:11:50 2023 -0500

    fixed JWT API gateway definition

commit acac8d6
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Mon Jul 31 00:08:30 2023 -0500

    tweaked API spec

commit 5ec3caf
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sun Jul 30 23:50:53 2023 -0500

    update sharing API

commit af9ad6d
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sun Jul 30 23:28:56 2023 -0500

    added PATCH /shares/folders/{documentId}

commit 1a245b0
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sun Jul 30 22:09:43 2023 -0500

    fixed sorting results on getBatch

commit e61cf72
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sun Jul 30 21:43:02 2023 -0500

    added ScanIndexForward to QueryConfig

commit 5a590bc
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sun Jul 30 20:35:42 2023 -0500

    added IndexName support to queryBeginsWith() method

commit 1f88cd5
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sat Jul 29 21:51:19 2023 -0500

    removed DocumentEventService

commit 54eb09c
Author: Mike Friesen <mfriesen@gmail.com>
Date:   Sat Jul 29 21:40:20 2023 -0500

    changed Document Folders to use FolderIndexRecord object

* Revert "Squashed commit of the following:"

This reverts commit 6803be7.

* update securityScheme for IAM / API Key

* feat: adds S3MultipartUploader

* refactor: refactored

* refactor: added logging

* refactor: adapted DocumentCompressor to v1.12 changes, added checksum tests

* test: skipped ZIP in tempfiles event

---------

Co-authored-by: Mike Friesen <mfriesen@gmail.com>
@ytsipun
Copy link
Member

ytsipun commented Aug 10, 2023

Intoduced in v1.12.0.

@ytsipun ytsipun closed this as completed Aug 10, 2023
mfriesen added a commit that referenced this issue Aug 14, 2023
* #130 - Feature: Added PUT/PATCH /documents/{documentId}/tags to allow adding/updating multiple tags to a document at once

* #131 - Feature: Added POST /documents/compress to create a ZIP file of multiple document's contents.

* #133 - Feature: Added PATCH /documents/tags to allow adding tags to documents based on tag search criteria

* #138 - Bug: Added validation for reservered tag keys when using POST /documents

* #144 - Bug: DELETE /indices/folder return 500 when invalid siteId is used

* #145 - Feature: Added Typesense support to API GET documents/{id}/fulltext

* #150 - Bug: Renamed module 'fulltext' to 'opensearch'

* #151 - Bug: Documentation for POST /documents/upload, missing contentLength

* #158 - Bug: Improved Actions API validation

* #159 - Feature: API Keys added READ / WRITE / DELETE permissions

* #161 - Bug: POST /documents/upload generates incorrect URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants