Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for operating on byte strings #15

Merged
merged 2 commits into from
Sep 5, 2023
Merged

Add support for operating on byte strings #15

merged 2 commits into from
Sep 5, 2023

Conversation

danielparks
Copy link
Contributor

@danielparks danielparks commented May 28, 2023

This adds Shlex::new_bytes(), quote_bytes(), and join_bytes(). This adds a bytes submodule for operating on byte strings that might contain invalid UTF-8. Where possible I have switched the functions that operate on str to use the _bytes functions internally to avoid duplicating code and eliminate the potential for differing behavior between the two functions.

It includes trivial tests that confirm that the bytes version of functions actually work on invalid UTF-8.

I have not updated the documentation; this is just a first draft to solicit feedback.

Fixes #12

Update: Fixed some problems I mentioned in the comments. Updated the description above; italics are new.

@danielparks
Copy link
Contributor Author

This doesn’t change the MSRV. cargo msrv says it’s 1.36.0 for both the master branch and this one.

I’m also reasonably confident that this doesn’t change any existing behavior.

@danielparks
Copy link
Contributor Author

Oh, oops. I didn’t actually deal with Shlex::next() returning a String… and of course as is String::from_utf8_unchecked() isn’t safe in this PR because you can run it on an arbitrary byte string.

I’ll deal with that stuff tomorrow.

@danielparks
Copy link
Contributor Author

danielparks commented May 29, 2023

Ok. This should be basically functional now. It passes cargo test --no-default-features and cargo test, and cargo msrv still returns 1.36.0.

Still to do:

  • Update documentation and README.
  • If it’s not too terrible, implement shlex::Shlex with shlex::bytes::Shlex.
  • Check ergonomics with OsString and OsStr, since that’s probably what most people will want to use it with.

@danielparks
Copy link
Contributor Author

I added a commit to implement Shlex with bytes::Shlex. It used Deref and DerefMut to ensure that accessing the fields (e.g. line_no) still work.

Using OsStr requires std::os::unix::ffi::OsStrExt, but otherwise it doesn’t seem like a problem for users. OsString is basically similar.

@danielparks
Copy link
Contributor Author

Ok, documentation, README and CHANGELOG updated! Ping @comex, let me know what you think. I generally tried to match your style, but I imagine I diverged a bit, especially in the documentation. Happy to change things.

MSRV remains 1.36.0.

@danielparks danielparks marked this pull request as ready for review July 3, 2023 18:37
@danielparks danielparks changed the title Draft: Add support for operating on byte strings Add support for operating on byte strings Jul 3, 2023
@fenhl fenhl requested review from comex and fenhl July 3, 2023 18:58
Copy link
Collaborator

@fenhl fenhl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of quote_internal and the Option wrapping by implementing crate::quote like this:

/// Given a single word, return a string suitable to encode it as a shell argument.
pub fn quote(in_str: &str) -> Cow<str> {
    match bytes::quote(in_str.as_bytes()) {
        Cow::Borrowed(out) => {
            // Safety: bytes::quote() was given valid UTF-8, so it will
            // always return valid UTF-8.
            unsafe { core::str::from_utf8_unchecked(out) }.into()
        }
        Cow::Owned(out) => {
            // Safety: bytes::quote() was given valid UTF-8, so it will
            // always return valid UTF-8.
            unsafe { String::from_utf8_unchecked(out) }.into()
        }
    }
}

Or am I missing another reason it's done this way?

src/lib.rs Outdated Show resolved Hide resolved
@danielparks
Copy link
Contributor Author

That is nicer. In other projects I don’t use the _unchecked functions, so I have to use Option to avoid doing extra work. Of course, that isn’t necessary here.

This adds a `bytes` submodule for operating on byte strings that might
contain invalid UTF-8. Where possible I have switched the functions that
operate on `str` to use the `bytes` functions internally to avoid
duplicating code and eliminate the potential for differing behavior
between the two functions.

It includes trivial tests that confirm that the `bytes` version of
functions actually work on invalid UTF-8.

Fixes #12
@fenhl fenhl merged commit f44b62e into comex:master Sep 5, 2023
3 checks passed
@fenhl
Copy link
Collaborator

fenhl commented Sep 5, 2023

Thank you for your contribution! This has now been released as version 1.2.0.

@danielparks danielparks deleted the bytes branch September 5, 2023 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support operating on bytes
2 participants