Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of encodings (hex, base64, base64url) #128

Open
Uzlopak opened this issue Oct 23, 2023 · 7 comments
Open

performance of encodings (hex, base64, base64url) #128

Uzlopak opened this issue Oct 23, 2023 · 7 comments

Comments

@Uzlopak
Copy link
Contributor

Uzlopak commented Oct 23, 2023

In the last few days I was investigating the performance of hex and especially base64 and base64url

Added benchmarks nodejs/node#50348

base64 encoding is using the functionality from the base64 dependency.

base64 decoding is not using the functionality from the base64 dependency. We have a custom implementation, which handles the base64 decoding gracefully. So a whitespace does not result in an error but gets ignored.

base64url encoding is a custom implementation. So it is slower than it could be.

base64url decoding is a custom implementation. So it is slower than it could be.

hex encoding is a custom implementation. So it is slower than it could be.

hex decoding is a custom implementation. So it is slower than it could be.

Maybe this is something to be implemented in simdutf?

@lemire
@anonrig

@lemire
Copy link
Member

lemire commented Oct 23, 2023

All good ideas/pointers.

It would be possible to design a base64 library specifically for the needs of Node.js. We could throw in base16 (hex) and so forth. Handling spaces efficiently is possible.

@lemire
Copy link
Member

lemire commented Dec 13, 2023

The base64 decoder is robust with respect to spaces but it seems to ignore any non-base64 character, actually...

See what the specification says...

Whitespace characters such as spaces, tabs, and new lines contained within the base64-encoded string are ignored.

But look...

> Buffer.from(' \(\(AA\(\\AA','base64')
<Buffer 00 00 00>

The hex decoder seems to stop on the first non-hex character:

> Buffer.from(' \(\(AA\(\\AA','hex')
<Buffer >
> Buffer.from('AAAA','hex')
<Buffer aa aa>

This seems documented:

Data truncation may occur when decoding strings that do not exclusively consist of an even number of hexadecimal characters

@aduh95
Copy link

aduh95 commented Dec 13, 2023

See what the specification says...

What specification? AFAIK Buffer is a Node.js API, the only "specification" would be Node.js docs.

@lemire
Copy link
Member

lemire commented Dec 14, 2023

What specification? AFAIK Buffer is a Node.js API, the only "specification" would be Node.js docs.

Yes. I quoted the documentation.

@lemire
Copy link
Member

lemire commented Mar 16, 2024

Base64 support is coming soon in simdutf: simdutf/simdutf#375

@lemire
Copy link
Member

lemire commented Apr 8, 2024

atob performance has been greatly improved by @anonrig
nodejs/node#52381

So this handles part of the issue.

@lemire
Copy link
Member

lemire commented Apr 8, 2024

@anonrig is handling part of the rest of the issue in nodejs/node#52428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants