-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(model): Require size when writing to ProvenanceFileStorage
#7292
Conversation
@@ -22,6 +22,7 @@ package org.ossreviewtoolkit.model.utils | |||
import io.kotest.core.spec.style.WordSpec | |||
import io.kotest.matchers.shouldBe | |||
|
|||
import java.io.ByteArrayInputStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding as "syntactic sugar" a fun putData(provenance, bytes: ByteArray()
for not making the caller that verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean just for the test code? Because I think the change it FileListResolver
is not that verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant for the file list resolver. Anyhow, I still do not understand yet why it is needed at all. Does the commit message intend to reference int lo_read(PGconn *conn, int fd, char *buf, size_t len);
?
I'm asking because for len
one should pass the size of the buffer, not the size of the whole stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't enough about the implementation, maybe it's also a requirement of the JDBC driver or so. But after reading some docs I think it should be possible to remove that again later on. Therefore I have rephrased the commit message so that this is basically an ask of the people working on the ORT server to introduce this to unblock us for now pending later investigation if it can be removed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, there is really no pointer to any API docs which shows that it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave the decision whether to merge up to you guys, even tough I prefer to wait until there is a clear reason that it is really required. Because all API docs we've seen now seem to provide a way without the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm also reluctant towards this "dummy" change. @mnonnenmacher, could you maybe explain why it's easier to do this change here, than making the change in the server to not require length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give you a bit background information:
In ORT Server, we have introduced a generic storage API for storing all kinds of data. It can be backed by different implementations. Postgres is one of those, another one could be a file storage like Azure Storage Accounts. When designing the API we had the different possible implementations in mind, and some of them require the size of the data to be known beforehand. This is for instance needed by the putBlob operation from Azure, where requests require a correct Content-Length
header.
For Postgres itself, the data size is not required for writing or reading blobs. However, the implementation in ORT Server uses the size information to distinguish between data that can be read into memory and large data that needs to be streamed to the client. If the size is below a configurable threshold, it is read directly and send as response to the client. Otherwise, we need to work with temporary files to buffer the data; this is because Postgres requires that blobs can only be accessed in a transaction; however, when sending the HTTP response to the client, the transaction is already closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details @oheger-bosch. So if instead of Postgres the Azure backend would be mentioned in the commit message as an example that really requires the length (like you do above), and also the information from the last paragraph that ORT might use different "modes" depending on the size of the data would go into the commit message, I'd be ok with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the commit message.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #7292 +/- ##
=========================================
Coverage 61.20% 61.20%
Complexity 1969 1969
=========================================
Files 335 335
Lines 16487 16488 +1
Branches 2350 2350
=========================================
+ Hits 10091 10092 +1
Misses 5419 5419
Partials 977 977
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
76c0f8e
to
a0de798
Compare
Add a `size` argument to the `putData` function of the `ProvenanceFileStorage` which represents the size of the input stream to be written. This is currently a requirement to implement the `ProvenanceFileStorage` for the ORT Server using a generic storic interface it provides. This interface requires the size because some potentail storage backends require this, for example the putBlob [1] method from the Azure blob storage requires a Content-Length header to be set. The size is also required by the PostgreSQL implementation of this storage interface which uses it to decide if the data is directly streamed to the client or if it needs to be buffered in a file. This is necessary because PostgreSQL requires that blobs are read in a transaction, but the time the transaction stays open should not depend on the time it takes to transfer the data to the client. The current implementations of `ProvenanceFileStorage` do not require the size of the input stream, but the size is easily available for all callers of the `putData` function. However, if for example Azure should be implemented as a storage backend in future, the size argument would be required as explained above. [1]: https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.io>
a0de798
to
5c879da
Compare
Frank said to leave the discussion to us.
Merged as the failing |
Add a size argument to the
putData
function of theProvenanceFileStorage
which represents the size of the input stream to be written.This is a precondition for being able to use the PostgreSQL large object storage as a backend for the storage 1 which is a requirement in the ongoing ORT Server development.
The current implementations of
ProvenanceFileStorage
do not require the size of the input stream, but the size is easily available for all callers of theputData
function.