Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WireLib.ParseEscapes #2752

Merged
merged 16 commits into from
Sep 18, 2023
Merged

Conversation

Denneisk
Copy link
Member

@Denneisk Denneisk commented Sep 16, 2023

New WireLib function ParseEscapes. Takes a string input generated by the user (such as from a TextEntry panel) and parses escape characters out. The output is the escaped string. This is a complete implementation of Luajit's escape sequences.

Purpose:
Some user inputs have their own implementation of escaping characters. This is meant to be a unified and thorough alternative and supports the entire Lua escape sequence library.

Notes:
This differs from Lua's escaping only in that invalid escape sequences are skipped instead of erroring.

I've only implemented this in constant value more as a demonstration than anything. Any user input should be changed to use this if desired. E2's tokenizer, for example, is a noteworthy target.

lua/wire/server/wirelib.lua Outdated Show resolved Hide resolved
A lot cleaner this way also.
@Denneisk
Copy link
Member Author

Denneisk commented Sep 16, 2023

Noticed E2 string literals could advantage from this. Not going to touch it until #2555 is done though.

Expose new function WireLib.HexToNum (this should be a Garry's feature imo)
@Denneisk
Copy link
Member Author

Denneisk commented Sep 16, 2023

Feels a bit uncomfortable adding any more passes. Also I added WireLib.HexToNum because I thought it could also be useful but let me know if that's overstepping.

@thegrb93
Copy link
Contributor

HexToNum can just be tonumber(x,16)

@Denneisk
Copy link
Member Author

Of course. I knew I was missing something.

@Denneisk
Copy link
Member Author

The implementation of this keeps bugging me, obvious global functions or not. I'm going to draft it until it feels right.

@Denneisk Denneisk marked this pull request as draft September 16, 2023 22:09
Try adding unicode to that, I dare you.
Added Lua unicode support
@Denneisk
Copy link
Member Author

That's not even the multi-pass gsub. I'm going to bed.

@thegrb93
Copy link
Contributor

I'd like to see an implementation more like

local singleCharEscapes =
{
	n = "\n", r = "\r", t = "\t", ["\\"] = "\\",
	["'"] = "'", ["\""] = "\"", a = "\a",
	b = "\b", f = "\f", v = "\v"
}

local multiCharEscapes =
{
	x = function(arg)
		
	end
}

str = string.gsub(str, "(\\(.?)([^\\]?[^\\]?))", function(str, i, arg)
	return singleCharEscapes[i] or (multiCharEscapes[i] and multiCharEscapes[i](arg)) or str
end)

@Denneisk
Copy link
Member Author

Not as simple against unicode but I'll see what I can do.

@Denneisk
Copy link
Member Author

Denneisk commented Sep 17, 2023

I can only feasibly get that idea to make sense by still having unicode parsed separately, and in that case it only parses two (or 10) multi conditions which is \x and \0-9.

@Denneisk
Copy link
Member Author

I think I'll settle for this which derives from your idea but inlines each case, which looks almost like what I started with... huh.

@Denneisk Denneisk marked this pull request as ready for review September 17, 2023 17:29
@Denneisk Denneisk changed the title Add WireLib.ParseString Add WireLib.ParseEscapes Sep 17, 2023
@thegrb93
Copy link
Contributor

You can't do two passes otherwise stuff like \\u will get parsed by the unicode despite it should parse to \u

@Denneisk
Copy link
Member Author

Ah, you're right. Sorry, I'm not being thorough enough.

@Denneisk
Copy link
Member Author

I tried your suggestion again and realized it was simpler than I had thought. The only thing I'd like to know is if you wouldn't mind having it inlined instead.

@thegrb93
Copy link
Contributor

Looks pretty nice. I don't mind if its inlined.

Denneisk and others added 2 commits September 17, 2023 21:56
Fixed case where escape character would omit trailing text
Fixed case where \x would omit trailing text
@thegrb93
Copy link
Contributor

One last thing that we might want to be done is to benchmark it and limit the string length depending on how much it can do. Like maybe feed it a string.rep("\ ", 100)

@Denneisk
Copy link
Member Author

Denneisk commented Sep 18, 2023

Hex should be specifically two characters long to match the Lua standard.

@thegrb93
Copy link
Contributor

All right, sure

@Denneisk
Copy link
Member Author

Denneisk commented Sep 18, 2023

On my 2015 CPU in a server filled with sents, benchmark is between 0.001 to 0.004 seconds with 27075 semi-randomized characters and sequences with roughly 6290 matches. Even string.reping it to ~31440 matches is almost always less than a tick (0.009 seconds average). So, really inconsequential.
Using an escapeless string of 88064 chars is generally under 0.001 seconds.

@thegrb93 thegrb93 merged commit b43c615 into wiremod:master Sep 18, 2023
1 check passed
@Denneisk
Copy link
Member Author

And I was just about to include it in the tokenizer. Oh well, another time.

@Denneisk Denneisk deleted the wirelib-parsestring branch April 14, 2024 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants