Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose String.resize() #79177

Closed
wants to merge 1 commit into from

Conversation

dsnopek
Copy link
Contributor

@dsnopek dsnopek commented Jul 7, 2023

This supersedes PR #79156 (which didn't expose String.resize() the normal way, but instead added a function to gdextension_interface.h so it wouldn't be exposed to GDScript)

The one thing I'm concerned about (now that this would be exposed to GDScript) is that resize() takes the array size of the string (so, including the null terminator character), but length() returns the string length (so, without the null terminator aka 1 less than the array length).

Will this be too confusing to users? Would it make sense to also expose size() so that str.resize(str.size() + 2) clearly makes the string 2 characters longer? Or, will having both size() and length() just be even more confusing? Or, should the version of String.resize() that's exposed be made to work with string length, rather than array length?

Fixes godot-cpp issue godotengine/godot-cpp#1141 which originally came up when trying to compile the text_server_adv module in Godot as a GDExtension. See: #77532

@dsnopek dsnopek added this to the 4.x milestone Jul 7, 2023
@dsnopek dsnopek requested a review from bruvzg July 7, 2023 22:05
@dsnopek dsnopek requested a review from a team as a code owner July 7, 2023 22:05
@bruvzg
Copy link
Member

bruvzg commented Jul 7, 2023

Will this be too confusing to users? Would it make sense to also expose size()

I guess it makes sense to expose size as well.

@JoNax97
Copy link
Contributor

JoNax97 commented Jul 7, 2023

How does length/size work with multi-byte characters? I assume size() counts raw bytes, right?

<return type="int" />
<param index="0" name="size" type="int" />
<description>
Resizes this string to the new size. The size should be one more than desired string length (in order to accommodate the null terminator).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this make the String data mutable?

In most programming languages, Strings are immutable. I think this is true for all String methods in GDScript except for this one. Wouldn't it be better to return a copy that's resized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GDScript, you can already do:

var s = "abc"
s[2] = 3
print(s) # will print "ab3"

This was one of the points that @bruvzg brought up on my original PR (which aimed expose this only to GDExtension and not GDScript).

Wouldn't it be better to return a copy that's resized?

Unfortunately, this would defeat the purpose of having access to resize() from GDExtension. The specific use-case that triggered the quest to expose resize was to avoid an unnecessary copy in text_server_adv when compiling it as a GDExtension.

@dsnopek
Copy link
Contributor Author

dsnopek commented Jul 8, 2023

How does length/size work with multi-byte characters? I assume size() counts raw bytes, right?

String stores its data as an array of char32_t encoded as UTF-32, so each item in the array represents a single unicode character. size() counts the number of char32_t's in the array (including the null terminator) and length() does the same but excludes the null terminator.

@AThousandShips
Copy link
Member

AThousandShips commented Jul 8, 2023

Strings have to be immutable in GDScript AFAIK, as they are COW you can't modify them via methods but only by assignment, this is why there aren't any mutable methods to them (and why several mutable methods were removed previously, like erase in #54869)

@bruvzg
Copy link
Member

bruvzg commented Jul 8, 2023

Strings aren't immutable, there's an [] operator already.

@AThousandShips
Copy link
Member

AThousandShips commented Jul 8, 2023

My bad, not sure what has changed in core to make that possible

@bruvzg
Copy link
Member

bruvzg commented Jul 8, 2023

not sure what has changed in core to make that possible

I think it was always the case, at least it is in 3.5.

@AThousandShips
Copy link
Member

Does this actually work for properties though? That's the issue with mutability, not that variant of it can be mutated

Test for example:

class TestClass:
    var s0 := "abc"

func foo():
    var t0 := TestClass.new()
    print(t0.s0) # Prints "abc"
    t0.s0.resize(3)
    print(t0.s0) # Still prints "abc", not "ab"

@AThousandShips
Copy link
Member

This is why some methods were removed from Transform2D, like set_shear/scale/rotation because they were confusing

This also affects arrays, like:

var arr := ["abc"]
print(arr[0])
arr[0].resize(3)
print(arr[0])

@bruvzg
Copy link
Member

bruvzg commented Jul 8, 2023

Does this actually work for properties though? That's the issue with mutability, not that variant of it can be mutated

Yes and no, it will change the value of a variable in the object, but won't trigger any extra code in the setter function (same is true for any type which doesn't have changed signal, like Array or Dictionary - like #79166).

@AThousandShips
Copy link
Member

AThousandShips commented Jul 8, 2023

It won't change the value though, see the above code, I've verified that, and this is an established limitation

The only reason the [] operator works is because it uses a proxy

See: #70995

@bruvzg
Copy link
Member

bruvzg commented Jul 8, 2023

Well, if it's not desired and there are issues with GDScript (seems like a GDScript bug to me), I guess it's better to use #79156 instead. resize() + raw ptrw() definitely should be accessible for GDExtension, other methods of constructing strings are much slower.

@bruvzg
Copy link
Member

bruvzg commented Jul 8, 2023

The only reason the [] operator works is because it uses a proxy

Maybe it should not be exposed to scripting in this case?

@AThousandShips
Copy link
Member

AThousandShips commented Jul 8, 2023

Agreed, but exposing a method that is confusing in GDScript shouldn't be done, imo even with a warning it'll be annoying

Unsure how to expose resize just for GDExtension, unless you mean [] it works fine as far as I can tell

@dalexeev
Copy link
Member

dalexeev commented Jul 8, 2023

I agree that exposing size() can be confusing. I think this is a low-level detail that has no practical use for users.

Also I don't think we should expose resize() as is. I would expect resize() to change the length of the string, i.e. the number of characters without the null terminator. Maybe we can add a method with some other name (so as not to increase the inconsistency between GDScript and C++)? Like set_length() or resize_length()?

@AThousandShips
Copy link
Member

That still won't work correctly unless it's returning a string by copy instead of mutating

Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resize method, other than the mutability issues, needs to be implemented so it handles null termination correctly, the current version creates defective strings, test:

var s0 := "abcd"
var s1 := "ab"
s0.resize(3)
print(hash(s0))
print(hash(s1))

These will print different results

Haven't tested but this implies that there are some concerns with how accessing the raw string will work

In fact you will notice that the hash changes each time you run (more or less), implying it keeps going after the end of the string, this likely risks crashing if it doesn't catch a zero correctly, and if you see the hash function of String you'll see it uses null check to end the hash. This is because in this case "abcd\0" becomes "abc" instead of "ab\0"

Edit: It also breaks if you change the size to be larger, as it doesn't ensure zeros

It is also impossible to manually null-terminate strings that have been shortened, you can't do:

a.resize(2)
a[2] = ""

Though you can manually terminate strings that have been lengthened, though you still can't terminate them at the very end (where there's no null to begin with, as it is not zeroed)

@dsnopek
Copy link
Contributor Author

dsnopek commented Jul 8, 2023

Thanks, Everyone, for all the great discussion and insight!

I'm a little confused about the conclusion, though. Is the consensus that we should return to #79156? I wouldn't mind, I personally liked that approach better anyway. :-) Or, that we should make a better resize() that can be exposed to GDScript too?

@AThousandShips
Copy link
Member

It wouldn't be possible to do a "modify in place" resize for GDScript without massively changing the underlying implementation, without causing the confusion above

For GDExtension it depends on the usecase if it should do dedicated null handling IMO, the limitation here is that from GDScript it's cumbersome and weird to handle nulls internally without having them automatically (see above for the actually impossible situations), but GDExtension might be closer to bare metal to have the performance critical version

It would probably not be good to have the default method do any zeroing, as it would likely be expected to insert null elsewhere

@dsnopek
Copy link
Contributor Author

dsnopek commented Jul 8, 2023

Alright, I'll re-open PR #79156 and close this one!

@dsnopek dsnopek closed this Jul 8, 2023
@YuriSizov YuriSizov removed this from the 4.x milestone Dec 2, 2023
@dsnopek dsnopek deleted the expose-string-resize branch July 22, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants