Add the ability to create a ZIP file based on document IDs #131

reganwolfrom · 2023-05-30T22:29:16Z

For now, this request would take a list of document IDs and start an asynchronous process to retrieve these documents and compress into a ZIP file. There would be no functionality at this time to include child documents automatically; if a document has child documents, they would need to be specifically requested using their document IDs.

Request Endpoint:
POST /compress
{
documentIds:['aaa-aaa-aaa-aaa', 'aaa-bbb-bbb-bbb']
}

This would trigger the process to create a ZIP. This could be a Lambda, or an AWS Fargate task, depending on what makes more sense. We may need to use SQS for this processing, and may even need a compression task id and even a status endpoint to determine status. Or it's possible that we could return an S3 object URL, and then compression could update the object from empty to finished ZIP file, but it's possible object could return while compression is ongoing, returning only some of the document selected to include.

Once the compression has completed (and the requestor can determine that completed status), we may want to have them access the signed URL, but alternatively, we may need better management on how many times the file can be accessed. It's possible to add an expiration time, or we could add a delete endpoint if we have an ID outside of the URL.

So if we need a "download id" for each compression event, we could use POST /downloads/compress and GET /downloads/{downloadId} to get status and/or create and retrieve an S3 signed URL, and DELETE /downloads/{downloadId} that could delete the ZIP file.

First step will be to determine if it's even possible to do without a "download id"; however, it may make more sense to just use a "download id" with the GET and DELETE endpoints, to enable more control by an application with this process.

reganwolfrom · 2023-05-30T22:29:37Z

@formkiqMike, can you take a look at this description and see if any info should be changed? Thanks!

mfriesen · 2023-05-30T22:59:26Z

I think it should work like Google Drive where you select a list of document and then those document are zipped up into a temporary file and downloaded.

This probably could be piggy backed on the DocumentActionsProcessor and the DocumentActionsQueue.

The response would return a presigned url to a file in the Documents Bucket tempdata folder then a LifeCycle rule would need to be added to the Documents Bucket like the OCR bucket already has.

  LifecycleConfiguration:
    Rules:
      - Id: ExpiryRule
        Prefix: tempfiles/
        Status: Enabled
        ExpirationInDays: 1

RyanBuhrWA · 2023-06-01T20:54:04Z

FWIW, seconded! This would be a super handy feature for us.

ytsipun · 2023-08-06T23:25:49Z

#160

* feat: adds documents/compress request handler * feat: adds OpenAPI spec for /documents/compress * refactor: OAPI spec * feat: extends the tests * feat: adds retention policy to tempfiles in STAGING * refactor: prettified the tests * refactor: OAPI spec * feat: DocumentCompressor initial * refactor: used multipart upload only for large files * refactor: handled IOException, provided multipart upload S3 * refactor: refactored * refactor: resolved merge conflicts * refactor: refactored after review * refactor: refactored * refactor: extended the tests * refactor: refactored * Update build.gradle * Update api.yaml * Squashed commit of the following: commit 60eb79e Author: Mike Friesen <mfriesen@gmail.com> Date: Mon Jul 31 23:11:50 2023 -0500 fixed JWT API gateway definition commit acac8d6 Author: Mike Friesen <mfriesen@gmail.com> Date: Mon Jul 31 00:08:30 2023 -0500 tweaked API spec commit 5ec3caf Author: Mike Friesen <mfriesen@gmail.com> Date: Sun Jul 30 23:50:53 2023 -0500 update sharing API commit af9ad6d Author: Mike Friesen <mfriesen@gmail.com> Date: Sun Jul 30 23:28:56 2023 -0500 added PATCH /shares/folders/{documentId} commit 1a245b0 Author: Mike Friesen <mfriesen@gmail.com> Date: Sun Jul 30 22:09:43 2023 -0500 fixed sorting results on getBatch commit e61cf72 Author: Mike Friesen <mfriesen@gmail.com> Date: Sun Jul 30 21:43:02 2023 -0500 added ScanIndexForward to QueryConfig commit 5a590bc Author: Mike Friesen <mfriesen@gmail.com> Date: Sun Jul 30 20:35:42 2023 -0500 added IndexName support to queryBeginsWith() method commit 1f88cd5 Author: Mike Friesen <mfriesen@gmail.com> Date: Sat Jul 29 21:51:19 2023 -0500 removed DocumentEventService commit 54eb09c Author: Mike Friesen <mfriesen@gmail.com> Date: Sat Jul 29 21:40:20 2023 -0500 changed Document Folders to use FolderIndexRecord object * Revert "Squashed commit of the following:" This reverts commit 6803be7. * update securityScheme for IAM / API Key * feat: adds S3MultipartUploader * refactor: refactored * refactor: added logging * refactor: adapted DocumentCompressor to v1.12 changes, added checksum tests * test: skipped ZIP in tempfiles event --------- Co-authored-by: Mike Friesen <mfriesen@gmail.com>

ytsipun · 2023-08-10T15:17:47Z

Intoduced in v1.12.0.

* #130 - Feature: Added PUT/PATCH /documents/{documentId}/tags to allow adding/updating multiple tags to a document at once * #131 - Feature: Added POST /documents/compress to create a ZIP file of multiple document's contents. * #133 - Feature: Added PATCH /documents/tags to allow adding tags to documents based on tag search criteria * #138 - Bug: Added validation for reservered tag keys when using POST /documents * #144 - Bug: DELETE /indices/folder return 500 when invalid siteId is used * #145 - Feature: Added Typesense support to API GET documents/{id}/fulltext * #150 - Bug: Renamed module 'fulltext' to 'opensearch' * #151 - Bug: Documentation for POST /documents/upload, missing contentLength * #158 - Bug: Improved Actions API validation * #159 - Feature: API Keys added READ / WRITE / DELETE permissions * #161 - Bug: POST /documents/upload generates incorrect URL

reganwolfrom added the enhancement New feature or request label May 30, 2023

reganwolfrom assigned ytsipun May 30, 2023

mfriesen added this to the v1.12.0 milestone Jun 16, 2023

ytsipun mentioned this issue Jul 30, 2023

feat: adds the ability to create a ZIP file based on document ids #160

Merged

ytsipun closed this as completed Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to create a ZIP file based on document IDs #131

Add the ability to create a ZIP file based on document IDs #131

reganwolfrom commented May 30, 2023

reganwolfrom commented May 30, 2023

mfriesen commented May 30, 2023

RyanBuhrWA commented Jun 1, 2023

ytsipun commented Aug 6, 2023

ytsipun commented Aug 10, 2023

Add the ability to create a ZIP file based on document IDs #131

Add the ability to create a ZIP file based on document IDs #131

Comments

reganwolfrom commented May 30, 2023

reganwolfrom commented May 30, 2023

mfriesen commented May 30, 2023

RyanBuhrWA commented Jun 1, 2023

ytsipun commented Aug 6, 2023

ytsipun commented Aug 10, 2023