Optimize DecodeString and Decode methods. #1

kevina · 2016-11-18T07:17:49Z

In the DecodeString case it avoids unnecessary intermediate strings.
In the Decode case it just avoids bytes.Map, which is likely slow due
to having to make a function call for each character.

In the benchmark in ipfs/kubo#3376 is was able to save about 40 ms of cpu time.

In the DecodeString case it avoids unnecessary intermediate strings. In the Decode case it just avoids bytes.Map, which is likely slow due to having to make a function call for each character.

kevina · 2016-11-18T07:19:17Z

Note: I manually ran go test ./... and everything passed.

whyrusleeping · 2016-11-19T01:06:11Z

base32.go

-	n, _, err = enc.decode(dst, src)
+func (enc *Encoding) Decode(dst, s []byte) (n int, err error) {
+	stripped := make([]byte, 0, len(s))
+	for i := 0; i < len(s); i++ {


use for _, c := range s {. Its faster because it optimizes out bounds checks.

I can fix this one. But see below.

whyrusleeping · 2016-11-19T01:06:57Z

base32.go

 	return
 }

 // DecodeString returns the bytes represented by the base32 string s.
 func (enc *Encoding) DecodeString(s string) ([]byte, error) {
-	s = strings.Map(removeNewlinesMapper, s)
+	stripped := make([]byte, 0, len(s))
+	for i := 0; i < len(s); i++ {


same comment as above

The problem is it returns a rune and not a byte when using range for a string. Is there a way to do use range on the raw bytes or a string.

You would have to basically cast it to a []byte first. At which point you might as well just call Decode([]byte(s)) in the first place.

Right. So I was trying to avoid any necessary allocation so is this one okay as is?

heh, one thing you could do is convert the string to bytes, then overwrite that same array.

I could do that. It will require an extra pass and seams like an overkill. Eventually the go compiler should be smart enough to eliminate the bounds checks. Most other optimizing compilers will in simple cases like this.

Do you really think it could make that much of a difference? If so I will benchmark it and let you know.

It wont take an extra pass:

strb := []byte(s) off := 0 for i, b := range strb { if b == '\n' || b == '\r' { continue } strb[off] = b off++ } n, _, err := enc.decode(strb, strb[:off]) if err != nil { return nil, err } return strb[:n], nil

Yes it will. The extra pass is in the conversion from the string to a []byte.

I think this is just adding complexity but since you seam to really want it I will run some benchmarks and report back. It may be tomorrow through before I do it.

converting a string to a byte array is pretty well optimized. If youre going to count that as an extra pass then you have to count allocating a byte buffer as an extra pass as go zeros every buffer you allocate.

Like I said, I will run some benchmarks. In many cases you can allocate zeroed memory without require a pass to zero it.

kevina · 2016-11-19T01:45:22Z

@whyrusleeping Okay, I used "range" in Decode() but left DecodeString() alone.

If you are still not happy with this i can run some more tests to see if the index checking is really costing anything and if it would be better to convert to a []byte first.

Thanks

For whyrusleeping#1.

kevina · 2016-11-19T23:10:18Z

@whyrusleeping I ran some benchmarks. It turns out the index check is either optimized out or otherwise does not matter. Your version is faster because it only allocates once instead of twice (something that I missed yesterday, sorry), not because it avoids the index check. It also turns out that for i,c := range []byte(s) {} is optimized to avoid allocating an array (see https://github.com/golang/go/wiki/CompilerOptimizations#range-over-bytes ) Here are some numbers from the benchmark branch at https://github.com/kevina/base32 (commit 84ef217)

$ go test -bench 'BenchmarkKeyDecode.*'
BenchmarkKeyDecodeKevOrig-4      3000000               513 ns/op
BenchmarkKeyDecodeKevNew-4       3000000               449 ns/op
BenchmarkKeyDecodeOpt-4          3000000               452 ns/op
BenchmarkKeyDecodeDeOpt-4        3000000               553 ns/op
BenchmarkKeyDecodeWhy-4          3000000               470 ns/op
BenchmarkKeyDecodeFinal-4        3000000               482 ns/op

KevOrig is my orignal version, KevNew avoids an extra allocation, 'Opt' takes advantage of the for i,c := range []byte(s) {}optimization. DeOpt uses b := byte(s); for i,c := range b {} instead. Why is your original version. Final is a slightly refactored version of yours.

Do this by calling the internal decode using the same byte array for the source and dest.

kevina · 2016-11-19T23:21:40Z

I just pushed a commit that uses the version of DecodeString uses by the DecodeFinal benchmark tests.

I'm fine if we just go with that (basically your version). The time between the different versions (that only use a single alloc) is very small and various by 20ns each time I run it.

For whyrusleeping#1.

kevina · 2016-11-21T19:24:23Z

For compassion I added the original implementation in the benchmarking code (a5bba06). Here are the numbers from another run

BenchmarkKeyDecodeOrig-4         2000000               892 ns/op
BenchmarkKeyDecodeKevOrig-4      3000000               498 ns/op
BenchmarkKeyDecodeKevNew-4       3000000               434 ns/op
BenchmarkKeyDecodeOpt-4          3000000               433 ns/op
BenchmarkKeyDecodeDeOpt-4        3000000               528 ns/op
BenchmarkKeyDecodeWhy-4          3000000               462 ns/op
BenchmarkKeyDecodeFinal-4        3000000               464 ns/op

Orig is the original implementation.

whyrusleeping · 2016-11-22T03:14:44Z

This LGTM, thanks @kevina!

Optimize DecodeString and Decode methods.

b8121db

In the DecodeString case it avoids unnecessary intermediate strings. In the Decode case it just avoids bytes.Map, which is likely slow due to having to make a function call for each character.

This was referenced Nov 18, 2016

Suggestion to avoid exra layer of indirection by using a more compact representation of Cid ipfs/go-cid#3

Closed

base32 optimization whyrusleeping/todo#30

Closed

whyrusleeping reviewed Nov 19, 2016

View reviewed changes

Use "range" instead of indexes in Decode().

06d804a

kevina added a commit to kevina/base32 that referenced this pull request Nov 19, 2016

Benchmark different versions of DecodeString

84ef217

For whyrusleeping#1.

Only allocate a single string/[]byte in DecodeString.

f4e78d0

Do this by calling the internal decode using the same byte array for the source and dest.

kevina added a commit to kevina/base32 that referenced this pull request Nov 21, 2016

Add original implementation for benchmarking.

a5bba06

For whyrusleeping#1.

whyrusleeping merged commit 38ece81 into whyrusleeping:master Nov 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DecodeString and Decode methods. #1

Optimize DecodeString and Decode methods. #1

kevina commented Nov 18, 2016

kevina commented Nov 18, 2016

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016 •

edited

Loading

whyrusleeping Nov 19, 2016

kevina Nov 19, 2016

kevina commented Nov 19, 2016

kevina commented Nov 19, 2016

kevina commented Nov 19, 2016 •

edited

Loading

kevina commented Nov 21, 2016

whyrusleeping commented Nov 22, 2016

Optimize DecodeString and Decode methods. #1

Optimize DecodeString and Decode methods. #1

Conversation

kevina commented Nov 18, 2016

kevina commented Nov 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina Nov 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevina commented Nov 19, 2016

kevina commented Nov 19, 2016

kevina commented Nov 19, 2016 • edited Loading

kevina commented Nov 21, 2016

whyrusleeping commented Nov 22, 2016

kevina Nov 19, 2016 •

edited

Loading

kevina commented Nov 19, 2016 •

edited

Loading