Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(model): Require size when writing to ProvenanceFileStorage #7292

Merged
merged 1 commit into from
Jul 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ package org.ossreviewtoolkit.model.utils
import io.kotest.core.spec.style.WordSpec
import io.kotest.matchers.shouldBe

import java.io.ByteArrayInputStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding as "syntactic sugar" a fun putData(provenance, bytes: ByteArray() for not making the caller that verbose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just for the test code? Because I think the change it FileListResolver is not that verbose.

Copy link
Member

@fviernau fviernau Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant for the file list resolver. Anyhow, I still do not understand yet why it is needed at all. Does the commit message intend to reference int lo_read(PGconn *conn, int fd, char *buf, size_t len); ?

I'm asking because for len one should pass the size of the buffer, not the size of the whole stream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't enough about the implementation, maybe it's also a requirement of the JDBC driver or so. But after reading some docs I think it should be possible to remove that again later on. Therefore I have rephrased the commit message so that this is basically an ask of the people working on the ORT server to introduce this to unblock us for now pending later investigation if it can be removed again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, there is really no pointer to any API docs which shows that it's needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the decision whether to merge up to you guys, even tough I prefer to wait until there is a clear reason that it is really required. Because all API docs we've seen now seem to provide a way without the length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm also reluctant towards this "dummy" change. @mnonnenmacher, could you maybe explain why it's easier to do this change here, than making the change in the server to not require length?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give you a bit background information:
In ORT Server, we have introduced a generic storage API for storing all kinds of data. It can be backed by different implementations. Postgres is one of those, another one could be a file storage like Azure Storage Accounts. When designing the API we had the different possible implementations in mind, and some of them require the size of the data to be known beforehand. This is for instance needed by the putBlob operation from Azure, where requests require a correct Content-Length header.

For Postgres itself, the data size is not required for writing or reading blobs. However, the implementation in ORT Server uses the size information to distinguish between data that can be read into memory and large data that needs to be streamed to the client. If the size is below a configurable threshold, it is read directly and send as response to the client. Otherwise, we need to work with temporary files to buffer the data; this is because Postgres requires that blobs can only be accessed in a transaction; however, when sending the HTTP response to the client, the transaction is already closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details @oheger-bosch. So if instead of Postgres the Azure backend would be mentioned in the commit message as an example that really requires the length (like you do above), and also the information from the last paragraph that ORT might use different "modes" depending on the size of the data would go into the commit message, I'd be ok with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the commit message.

import java.io.InputStream

import org.ossreviewtoolkit.model.ArtifactProvenance
Expand Down Expand Up @@ -60,16 +61,23 @@ class PostgresProvenanceFileStorageFunTest : WordSpec({
}

"return true when data for the given provenance has been added" {
storage.putData(VCS_PROVENANCE, InputStream.nullInputStream())
storage.putData(VCS_PROVENANCE, InputStream.nullInputStream(), 0L)
fviernau marked this conversation as resolved.
Show resolved Hide resolved

storage.hasData(VCS_PROVENANCE) shouldBe true
}
}

"putData()" should {
"return the data corresponding to the given provenance given such data has been added" {
storage.putData(VCS_PROVENANCE, "VCS".byteInputStream())
storage.putData(SOURCE_ARTIFACT_PROVENANCE, "source artifact".byteInputStream())
val vcsByteArray = "VCS".toByteArray()
val sourceArtifactByteArray = "source artifact".toByteArray()

storage.putData(VCS_PROVENANCE, ByteArrayInputStream(vcsByteArray), vcsByteArray.size.toLong())
storage.putData(
SOURCE_ARTIFACT_PROVENANCE,
ByteArrayInputStream(sourceArtifactByteArray),
sourceArtifactByteArray.size.toLong()
)

storage.getData(VCS_PROVENANCE) shouldNotBeNull { String(use { readBytes() }) shouldBe "VCS" }
storage.getData(SOURCE_ARTIFACT_PROVENANCE) shouldNotBeNull {
Expand All @@ -78,8 +86,19 @@ class PostgresProvenanceFileStorageFunTest : WordSpec({
}

"return the overwritten file corresponding to the given provenance" {
storage.putData(SOURCE_ARTIFACT_PROVENANCE, "source artifact".byteInputStream())
storage.putData(SOURCE_ARTIFACT_PROVENANCE, "source artifact updated".byteInputStream())
val sourceArtifactByteArray = "source artifact".toByteArray()
val sourceArtifactUpdatedByteArray = "source artifact updated".toByteArray()

storage.putData(
SOURCE_ARTIFACT_PROVENANCE,
ByteArrayInputStream(sourceArtifactByteArray),
sourceArtifactByteArray.size.toLong()
)
storage.putData(
SOURCE_ARTIFACT_PROVENANCE,
ByteArrayInputStream(sourceArtifactUpdatedByteArray),
sourceArtifactUpdatedByteArray.size.toLong()
)

storage.getData(SOURCE_ARTIFACT_PROVENANCE) shouldNotBeNull {
String(use { readBytes() }) shouldBe "source artifact updated"
Expand Down
2 changes: 1 addition & 1 deletion model/src/main/kotlin/utils/FileArchiver.kt
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ class FileArchiver(

logger.info { "Archived directory '$directory' in $zipDuration." }

val writeDuration = measureTime { storage.putData(provenance, zipFile.inputStream()) }
val writeDuration = measureTime { storage.putData(provenance, zipFile.inputStream(), zipFile.length()) }

logger.info { "Wrote archive of directory '$directory' to storage in $writeDuration." }

Expand Down
2 changes: 1 addition & 1 deletion model/src/main/kotlin/utils/FileProvenanceFileStorage.kt
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ class FileProvenanceFileStorage(
return storage.exists(filePath)
}

override fun putData(provenance: KnownProvenance, data: InputStream) {
override fun putData(provenance: KnownProvenance, data: InputStream, size: Long) {
storage.write(getFilePath(provenance), data)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ class PostgresProvenanceFileStorage(
}.first()[table.provenance.count()].toInt()
} == 1

override fun putData(provenance: KnownProvenance, data: InputStream) {
override fun putData(provenance: KnownProvenance, data: InputStream, size: Long) {
database.transaction {
table.deleteWhere {
table.provenance eq provenance.storageKey()
Expand Down
6 changes: 3 additions & 3 deletions model/src/main/kotlin/utils/ProvenanceFileStorage.kt
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ interface ProvenanceFileStorage {
fun hasData(provenance: KnownProvenance): Boolean

/**
* Associate [provenance] with the given [data]. Replaces any existing association by [provenance]. The function
* implementation is responsible for closing the stream.
* Associate [provenance] with the given [data] of the provided [size]. Replaces any existing association by
* [provenance]. The function implementation is responsible for closing the stream.
*/
fun putData(provenance: KnownProvenance, data: InputStream)
fun putData(provenance: KnownProvenance, data: InputStream, size: Long)

/**
* Return the data associated by [provenance], or null if there is no such data. Note that it is the responsibility
Expand Down
4 changes: 3 additions & 1 deletion scanner/src/main/kotlin/utils/FileListResolver.kt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ package org.ossreviewtoolkit.scanner.utils

import com.fasterxml.jackson.module.kotlin.readValue

import java.io.ByteArrayInputStream
import java.io.File

import org.ossreviewtoolkit.model.HashAlgorithm
Expand Down Expand Up @@ -51,7 +52,8 @@ internal class FileListResolver(
}

private fun ProvenanceFileStorage.putFileList(provenance: KnownProvenance, fileList: FileList) {
putData(provenance, fileList.toYaml().byteInputStream())
val byteArray = fileList.toYaml().toByteArray()
putData(provenance, ByteArrayInputStream(byteArray), byteArray.size.toLong())
}

private fun ProvenanceFileStorage.getFileList(provenance: KnownProvenance): FileList? {
Expand Down
Loading