feat(model): Require size when writing to `ProvenanceFileStorage` #7292

mnonnenmacher · 2023-07-14T14:42:57Z

Add a size argument to the putData function of the ProvenanceFileStorage which represents the size of the input stream to be written.

This is a precondition for being able to use the PostgreSQL large object storage as a backend for the storage 1 which is a requirement in the ongoing ORT Server development.

The current implementations of ProvenanceFileStorage do not require the size of the input stream, but the size is easily available for all callers of the putData function.

fviernau · 2023-07-14T14:49:31Z

model/src/funTest/kotlin/utils/PostgresProvenanceFileStorageFunTest.kt

@@ -22,6 +22,7 @@ package org.ossreviewtoolkit.model.utils
 import io.kotest.core.spec.style.WordSpec
 import io.kotest.matchers.shouldBe

+import java.io.ByteArrayInputStream


How about adding as "syntactic sugar" a fun putData(provenance, bytes: ByteArray() for not making the caller that verbose?

You mean just for the test code? Because I think the change it FileListResolver is not that verbose.

I meant for the file list resolver. Anyhow, I still do not understand yet why it is needed at all. Does the commit message intend to reference int lo_read(PGconn *conn, int fd, char *buf, size_t len); ?

I'm asking because for len one should pass the size of the buffer, not the size of the whole stream.

I don't enough about the implementation, maybe it's also a requirement of the JDBC driver or so. But after reading some docs I think it should be possible to remove that again later on. Therefore I have rephrased the commit message so that this is basically an ask of the people working on the ORT server to introduce this to unblock us for now pending later investigation if it can be removed again.

hm, there is really no pointer to any API docs which shows that it's needed?

I'll leave the decision whether to merge up to you guys, even tough I prefer to wait until there is a clear reason that it is really required. Because all API docs we've seen now seem to provide a way without the length.

TBH, I'm also reluctant towards this "dummy" change. @mnonnenmacher, could you maybe explain why it's easier to do this change here, than making the change in the server to not require length?

To give you a bit background information:
In ORT Server, we have introduced a generic storage API for storing all kinds of data. It can be backed by different implementations. Postgres is one of those, another one could be a file storage like Azure Storage Accounts. When designing the API we had the different possible implementations in mind, and some of them require the size of the data to be known beforehand. This is for instance needed by the putBlob operation from Azure, where requests require a correct Content-Length header.

For Postgres itself, the data size is not required for writing or reading blobs. However, the implementation in ORT Server uses the size information to distinguish between data that can be read into memory and large data that needs to be streamed to the client. If the size is below a configurable threshold, it is read directly and send as response to the client. Otherwise, we need to work with temporary files to buffer the data; this is because Postgres requires that blobs can only be accessed in a transaction; however, when sending the HTTP response to the client, the transaction is already closed.

Thanks for the details @oheger-bosch. So if instead of Postgres the Azure backend would be mentioned in the commit message as an example that really requires the length (like you do above), and also the information from the last paragraph that ORT might use different "modes" depending on the size of the data would go into the commit message, I'd be ok with this change.

I have updated the commit message.

model/src/funTest/kotlin/utils/PostgresProvenanceFileStorageFunTest.kt

codecov · 2023-07-14T15:02:24Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (04849fd) 61.20% compared to head (a0de798) 61.20%.

❗ Current head a0de798 differs from pull request most recent head 5c879da. Consider uploading reports for the commit 5c879da to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #7292   +/-   ##
=========================================
  Coverage     61.20%   61.20%           
  Complexity     1969     1969           
=========================================
  Files           335      335           
  Lines         16487    16488    +1     
  Branches       2350     2350           
=========================================
+ Hits          10091    10092    +1     
  Misses         5419     5419           
  Partials        977      977

Flag	Coverage Δ
test	`36.59% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...src/main/kotlin/utils/FileProvenanceFileStorage.kt	`63.63% <ø> (ø)`
...main/kotlin/utils/PostgresProvenanceFileStorage.kt	`92.68% <ø> (ø)`
model/src/main/kotlin/utils/FileArchiver.kt	`85.71% <100.00%> (ø)`
scanner/src/main/kotlin/utils/FileListResolver.kt	`85.71% <100.00%> (+0.71%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Add a `size` argument to the `putData` function of the `ProvenanceFileStorage` which represents the size of the input stream to be written. This is currently a requirement to implement the `ProvenanceFileStorage` for the ORT Server using a generic storic interface it provides. This interface requires the size because some potentail storage backends require this, for example the putBlob [1] method from the Azure blob storage requires a Content-Length header to be set. The size is also required by the PostgreSQL implementation of this storage interface which uses it to decide if the data is directly streamed to the client or if it needs to be buffered in a file. This is necessary because PostgreSQL requires that blobs are read in a transaction, but the time the transaction stays open should not depend on the time it takes to transfer the data to the client. The current implementations of `ProvenanceFileStorage` do not require the size of the input stream, but the size is easily available for all callers of the `putData` function. However, if for example Azure should be implemented as a storage backend in future, the size argument would be required as explained above. [1]: https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>

Frank said to leave the discussion to us.

mnonnenmacher · 2023-07-17T09:48:52Z

Merged as the failing GodModFunTest and GitWorkingTreeFunTest are unrelated to the change.

mnonnenmacher requested a review from a team as a code owner July 14, 2023 14:42

fviernau previously requested changes Jul 14, 2023

View reviewed changes

mnonnenmacher force-pushed the provenance-file-storage-add-size branch 2 times, most recently from 76c0f8e to a0de798 Compare July 17, 2023 08:00

mnonnenmacher force-pushed the provenance-file-storage-add-size branch from a0de798 to 5c879da Compare July 17, 2023 08:16

sschuberth approved these changes Jul 17, 2023

View reviewed changes

mnonnenmacher merged commit 119a58d into main Jul 17, 2023
17 of 19 checks passed

mnonnenmacher deleted the provenance-file-storage-add-size branch July 17, 2023 09:48

sschuberth added release notes Changes that should be mentioned in release notes breaking API change Pull requests that break compatibility with previous API labels Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model): Require size when writing to `ProvenanceFileStorage` #7292

feat(model): Require size when writing to `ProvenanceFileStorage` #7292

mnonnenmacher commented Jul 14, 2023

fviernau Jul 14, 2023

mnonnenmacher Jul 14, 2023

fviernau Jul 14, 2023 •

edited

Loading

mnonnenmacher Jul 14, 2023

fviernau Jul 14, 2023

fviernau Jul 14, 2023

sschuberth Jul 14, 2023

oheger-bosch Jul 17, 2023

sschuberth Jul 17, 2023

mnonnenmacher Jul 17, 2023

codecov bot commented Jul 14, 2023 •

edited

Loading

mnonnenmacher commented Jul 17, 2023

feat(model): Require size when writing to ProvenanceFileStorage #7292

feat(model): Require size when writing to ProvenanceFileStorage #7292

Conversation

mnonnenmacher commented Jul 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fviernau Jul 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 14, 2023 • edited Loading

Codecov Report

mnonnenmacher commented Jul 17, 2023

feat(model): Require size when writing to `ProvenanceFileStorage` #7292

feat(model): Require size when writing to `ProvenanceFileStorage` #7292

fviernau Jul 14, 2023 •

edited

Loading

codecov bot commented Jul 14, 2023 •

edited

Loading