Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactivate "Performance improvements for Blob" #1167

Merged
merged 8 commits into from
Oct 24, 2022
Merged

Reactivate "Performance improvements for Blob" #1167

merged 8 commits into from
Oct 24, 2022

Conversation

jberkel
Copy link
Collaborator

@jberkel jberkel commented Oct 23, 2022

@jberkel jberkel added this to the 0.14.0 milestone Oct 23, 2022
@jberkel jberkel merged commit 3977813 into master Oct 24, 2022
@jberkel jberkel deleted the NSData-blobs branch October 24, 2022 11:12
Copy link
Owner

@stephencelis stephencelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jberkel Just caught wind of this change and I think it should be reverted quickly. It introduces a serious security issue for folks accessing the bytes interface.

public init(bytes: [UInt8]) {
self.bytes = bytes
public var bytes: UnsafePointer<UInt8> {
data.bytes.assumingMemoryBound(to: UInt8.self)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately undefined behavior and could lead to crashes. We should revert this.


public let bytes: [UInt8]
public let data: NSData
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these days NSData is going to make things slower in Swift. If we want to make fundamental changes like this they should be benchmarked. I have some other repos with examples that use swift-benchmark if that would help!

@stephencelis
Copy link
Owner

stephencelis commented Oct 28, 2022

@jberkel Sorry for the fly-by! And much appreciation for your continual maintenance of this 😄 Just don't want folks to get bit by this potential issue in production.

@stephencelis
Copy link
Owner

@jberkel I just pulled the 0.14.0 tag to prevent folks from taking this change for now, but please cut 0.14.1 when you have a chance to revert.

@jberkel
Copy link
Collaborator Author

jberkel commented Oct 28, 2022

@stephencelis Oh, hello. The PR this is based on sat around unchallenged for six years and now you're appearing here deus ex machina :)

Could you explain a bit more, I don't see the security issue, NSData.init(bytes:length:) always copies the data.
The previous code also had

let i8bufptr = UnsafeBufferPointer(start: bytes.assumingMemoryBound(to: UInt8.self), count: length)

so I don't see why this would be less safe.

@jberkel
Copy link
Collaborator Author

jberkel commented Oct 28, 2022

Apple's documentation mentions that

Accessing memory through the returned pointer is undefined if the memory has not been bound to T. To bind memory to T, use bindMemory(to:capacity:) instead of this method.

However, bindMemory(to:capacity:) doesn't really do anything to the underlying memory itself. The specification stresses that bound types need to be "layout compatible" for the behavior to be defined.

And as far as I can tell, the elementary type UInt8 should always be layout compatible: there are no alignment problems with single byte access.

@stephencelis
Copy link
Owner

@jberkel Again, sorry for the fly-by out of nowhere! Please let me take some time to try to flesh out my concerns. I'll start with the one that I think is the most pressing:

  1. Adding an API that returns an unsafe, dangling pointer:

    public var bytes: UnsafePointer<UInt8> {
      data.bytes.assumingMemoryBound(to: UInt8.self)
    }

    Dangling pointers can easily outlive the lifetime of the underlying memory and be deallocated before use. Swift provides scoped methods like withUnsafePointer that guarantee the lifetime of the pointer for the duration of the scope for a reason, but we're not doing the same here. And while NSData may copy the bytes from the original SQL result, I'm more worried that this NSData is the owner of the underlying memory and could deallocate the pointer before a SQLite.swift user interacts with it. This could lead to hard-to-diagnose crashes and other memory issues, and easily become an attack vector in Swift apps using SQLite.swift.

    Also note that Data, the Swift-friendly wrapper around NSData, does not provide an interface to this dangling pointer because it is not safe.

    I think Swift libraries should avoid exposing Unsafe* APIs like this in general, and when they do, the reason should be clearly documented and motivated. This is even a prerequisite of libraries Apple accepts into SSWG incubation. I hope you agree that we should probably follow suit.

  2. Updating the API to use a new data structure for performance improvements without validating the improvements with benchmarks. While it may be true that NSData and Data were anecdotally more performant when the original PR was opened six years ago, I'm pretty sure this is no longer the case today. Swift's performance has improved a lot since then, and while there may be opportunities to improve the library performance, I do think that it should be measured in actual benchmarks. And in the case of this change, I worry that it may have a measurably negative impact on library users.

  3. I don't think we should assume the underlying memory of NSData.bytes without very strong guarantees, and in general I think assumingMemoryBound(to:) should be avoided if there are safer alternatives. While the library may have other uses of assumingMemoryBound(to:), those should probably be audited to something safer, or the use should be clearly motivated/documented.

Hope these concerns are clear and make sense! I appreciate wanting to continually improve the library for its users, but I also want to make sure we don't ship something that could cause unintentional problems for them.

@jberkel
Copy link
Collaborator Author

jberkel commented Oct 31, 2022

@stephencelis

Adding an API that returns an unsafe, dangling pointer:

It's only a dangling pointer once the Blob / NSData is deallocated. Like in this case:

func test_dangling_pointer() {
    var unsafePointer: UnsafePointer<UInt8>? = nil
    do {
        var blob: Blob? = Blob(bytes: [42, 41, 40])
        unsafePointer = blob!.bytes
        blob = nil
    }
    XCTAssertEqual(unsafePointer?.pointee, 42)
}

Interestingly, even compiled with "guard malloc" and run with "malloc scribble" this test passes. It looks like the memory isn't freed immediately.

But yes, clearly some "shoot-in-the-foot" potential here.

... without validating the improvements with benchmarks.

That's a fair point, it should have been benchmarked. I'll try to do some benchmarking, more out of curiosity.

I don't think we should assume the underlying memory of NSData.bytes without very strong guarantees

This is probably the weakest argument, as stated above I don't see how this behavior can be undefined according to the Swift memory model.

This PR was rushed into the last release, which wasn't a good idea. On the other hand, it's been around for a few years and nobody challenged it, so I assumed it was ok to include.

I'll do a new release soon, don't have much time now because I'm travelling.

@stephencelis
Copy link
Owner

stephencelis commented Oct 31, 2022

It's only a dangling pointer once the Blob / NSData is deallocated

Sure, but APIs that return dangling pointers make it very easy for folks to make this mistake by holding onto the pointer and letting the data be deallocated. I think the job of a high-level library is to prevent folks from making these kinds of mistakes by not shipping interfaces that make it easy.

This is probably the weakest argument, as stated above I don't see how this behavior can be undefined according to the Swift memory model.

I'd like to be a little more cautious here as I don't see how we can guarantee it. We know that using unsafe interfaces can cause problems at runtime when used incorrectly. Might be worth asking the compiler engineers on the Swift forums if it is safe to use this API here and, if so, document the findings?

@jberkel jberkel mentioned this pull request Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants