Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String append / concat #1280

Open
3 tasks
leighmcculloch opened this issue Jun 4, 2024 · 9 comments
Open
3 tasks

String append / concat #1280

leighmcculloch opened this issue Jun 4, 2024 · 9 comments

Comments

@leighmcculloch
Copy link
Member

@willemneal mentioned today that the String type has no concatenation functionality and replicating it with the existing SDK API is pretty complex because the developer has to use an allocator.

We could explore how to add this to the SDK without a protocol change, although very likely a protocol change is required here.

This issue probably requires:

  • Investigation / spike into how we could do concat without alloc. (It's unreasonable to require alloc in an SDK function.)
  • Do an audit of Soroban Env host functions for String and Bytes to see if we have a lot of difference in their API in attempt to keep them more similar as part of anything proposed.
  • Propose new host functions if needed for String.
@willemneal
Copy link
Member

Currently String has a copy_into_slice. So you can create a Vec as the same length as the combined strings. Then copy each into it and convert into String.

@jayz22
Copy link
Contributor

jayz22 commented Jun 6, 2024

String can also convert to and from bytes, and there is bytes_append host function, maybe that's easier?

@leighmcculloch
Copy link
Member Author

Ah yes, that'd be a much more preferred approach. Thanks @jayz22 !

@leighmcculloch
Copy link
Member Author

@jayz22 How can String convert to and from Bytes? I don't see any existing SDK functionality for converting between them, and I don't see any host functions for doing so either.

@jayz22
Copy link
Contributor

jayz22 commented Jul 24, 2024

@jayz22 How can String convert to and from Bytes? I don't see any existing SDK functionality for converting between them, and I don't see any host functions for doing so either.

Actually it would be easier since String can directly converted to/from raw bytes via the linear memory, so one way could be

str1->string_copy_to_slice()
str2->string_copy_to_slice()
string_new_from_slice(concatenated_slice) -> str3

no need to go through the Bytes.

A unit test on the host side would be:

#[test]
fn string_concat() -> Result<(), HostError> {
    let host = Host::default();
    let str1 = "abc";
    let str2 = "def";
    let str1_obj = host.string_new_from_slice(str1.as_bytes())?;
    let str2_obj = host.string_new_from_slice(str2.as_bytes())?;

    let mut concat_slice = vec![0; str1.len() + str2.len()];
    host.string_copy_to_slice(str1_obj, Val::from_u32(0), &mut concat_slice[0..str1.len()])?;
    host.string_copy_to_slice(str2_obj, Val::from_u32(0), &mut concat_slice[str1.len()..])?;
    let str3 = host.string_new_from_slice(&concat_slice)?;

    let str_ref = host.string_new_from_slice(b"abcdef")?;
    assert_eq!(host.obj_cmp(str3.to_val(), str_ref.to_val())?, 0);

    Ok(())
}

@leighmcculloch
Copy link
Member Author

Writing the bytes into local memory requires alloc, or requires using a buffer repeatedly. The main goal of this issue is to not use alloc. Using a buffer is an option, but an expensive one I think.

@jayz22
Copy link
Contributor

jayz22 commented Jul 25, 2024

Maybe I'm missing something, but why would copying via linear memory expensive? The linear memory routines are designed for these type of repeated/large number of bytes manipulation.
The bytes_append host function also needs to copy both bytes objects into an internal slice and create a new object from it. So I think performance wise it should be equivalent (minus a couple of host function invocation costs).

@leighmcculloch
Copy link
Member Author

leighmcculloch commented Jul 25, 2024

@jayz22 To add an append function to the SDK it needs to pick some arbitrary buffer size, then for loop over copying the bytes to the slice on buffer at a time, then back to the host. It seems like a lot of repetitive bytes pushing across the boundary for something that could occur entirely host side. Depending on what we pick as a suitable buffer size, and depending on the input size, it could be multiple host fn calls, and it's 2 host fn calls per buffer filled.

Maybe we don't need to optimise this now, but this seems like the sort of thing host fns should do entirely host side.

@jayz22
Copy link
Contributor

jayz22 commented Jul 29, 2024

Yeah you are right, that would require picking a buffer size or using an allocator.
In that case we should add some helper host functions for this, and possibly alone with other ones being proposed stellar/rs-soroban-env#1337 and stellar/rs-soroban-env#1101

  • string append
  • string to/from_bytes
  • string/bytes search

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@leighmcculloch @willemneal @jayz22 and others