Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add length_utf16 validator #245

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DXist
Copy link
Contributor

@DXist DXist commented Mar 20, 2023

This PR adds length_utf16 validator.

My project exposes data from Salesforce via JsonSchema based API. I want to validate field lengths in the same way as Salesforce does - by counting UTF16 characters.

UTF16 is used for Unicode string representation in JavaScript, Java and Salesforce APEX. I think this validator could be useful to others as well. A good use case is to align backend and frontend length validators.

An example of mismatch between UTF16 and Unicode codepoints: '𝔠' symbol has 2 UTF16 characters but it's still 1 Unicode codepoint.

Should I wrap the implementation in optional feature length_utf16 ?

@DXist DXist mentioned this pull request Mar 20, 2023
@Keats
Copy link
Owner

Keats commented Mar 20, 2023

I don't think it makes sense to add that to the library, it's better added as a custom validator.

@LeoniePhiline
Copy link

@Keats The need for an UTF-16 code unit length validator is very common - assume all of web form handling -, since the maxlength of HTML form fields counts UTF-16 code units.

If the frontend counts UTF-16 code units, and the backend counts UTF-8 code units, then inconsistencies arise whenever values contain characters encoded with different length in UTF-16 vs UTF-8.

This results in values being rejected by the server which passed client side validation, whenever the server's UTF-8 representation longer than the browser's UTF-16 representation.

@@ -81,6 +87,7 @@ impl Validator {
Validator::Regex(_) => "regex",
Validator::Range { .. } => "range",
Validator::Length { .. } => "length",
Validator::LengthUTF16 { .. } => "length_utf16",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Validator::LengthUTF16 { .. } => "length_utf16",
Validator::LengthUtf16 { .. } => "length_utf16",

For consistency with Rust code style, you might want to use Utf in identifiers.

E.g. https://doc.rust-lang.org/std/str/struct.EncodeUtf16.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address code style suggestions if the approach with an extra builtin validator type is desired for the crate users.

We could gather more comments/thumbs up in the MR description for more feedback.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it makes more sense to go with a parameter approach to the length validator like mentioned in #250 otherwise we just duplicate things that are 99% the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants