Incorrect result for Buffer#toString #6075

kuldeepaggarwal · 2016-04-06T10:18:43Z

Version: v5.10.1
Platform: Darwin KD 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep 1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64
Subsystem: Buffer.js

Hello Team,

I am using older version(0.10.40) on one my project and facing some issue while grouping of data type Buffer. Issue also present on latest version.

Issue

this.process.version // => 'v5.10.1'
var a = new Buffer([217, 132, 45, 138, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192])
var b = new Buffer([217, 132, 45, 180, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192])
a.toString() // => 'ل-�Mo\u0011��y\u0000PV;1�'
b.toString() // => 'ل-�Mo\u0011��y\u0000PV;1�'
a.toString('hex') // => 'd9842d8a4d6f11e48a790050563b31c0'
b.toString('hex') // => 'd9842db44d6f11e48a790050563b31c0'

If you see carefully, then there is difference in the hex values of both the variables but their utf8 string values are exactly which is actually creating problem for us.

Actual Use Case

users = [
  {
    uuid: new Buffer([217, 132, 45, 138, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192]),
  },
  {
    uuid: new Buffer([217, 132, 45, 180, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192]),
  }
]

posts = [
  { title: 'First Post', user_uuid: new Buffer([217, 132, 45, 138, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192]) },
  { title: 'Second Post', user_uuid: new Buffer([217, 132, 45, 180, 77, 111, 17, 228, 138, 121, 0, 80, 86, 59, 49, 192]) }
]

_.groupBy(posts, function(post) {
  return post.user_uuid; // written in other library, like: Bookshelf
})

Expected Result

We should have 2 keys as user_uuid for posts are different.

Actual Result

All the posts are grouped under same key, because #toString() returns same value for both the buffer object.

Fix

Buffer#toString should have default hex encoding output.

Please let me know if I am on wrong path or understood incorrectly or it should be fix on other libraries itself. And if you think that I am on right path and it should be fixed here then I can raise a PR for the same.

The text was updated successfully, but these errors were encountered:

bnoordhuis · 2016-04-06T10:29:37Z

The issue is that both input buffers contain invalid character sequences that get substituted with the replacement character, u+FFFD. That's why the UTF-8 strings are the same - the replacements are in the same locations - but the hexadecimal representation is not.

I'll close, Buffer#toString() is working as expected and documented in this case.

kuldeepaggarwal · 2016-04-06T11:20:49Z

@bnoordhuis I don't understand where input buffers contain invalid characters? Can you please point out where the input is wrong.

bnoordhuis · 2016-04-06T15:04:23Z

For example, in the sequence [217, 132, 45, 138, 77...], 138 is not a valid starting point for a UTF-8 character sequence because those always have the two most significant bits set (EDIT: except for single-byte characters, of course.) 138 & 192 is 128 when it should be 192 (because 192 == 128 + 64.)

kuldeepaggarwal · 2016-04-06T15:15:25Z

I apologize that I still don't understand the concept. Can you please provide some reference where I can read about Buffer in detail so that I could understand what you meant.

I might be looking dumb here.

bnoordhuis · 2016-04-06T15:26:04Z

The no-argument version of buf.toString() interprets the bytes in the buffer as UTF-8 and turns that into a string. One or more bytes make up a character; https://en.wikipedia.org/wiki/UTF-8#Description explains what those byte sequences look like. Not all sequences are valid; those are replaced with a U+FFFD character.

Hope that clears it up.

kuldeepaggarwal · 2016-04-07T04:16:38Z

Thanks a lot @bnoordhuis 💙 💛 💚 💜

bnoordhuis closed this as completed Apr 6, 2016

bnoordhuis added invalid Issues and PRs that are invalid. buffer Issues and PRs related to the buffer subsystem. labels Apr 6, 2016

jgehrcke mentioned this issue Nov 28, 2019

docs: Buffer.toString(): add note about invalid data #30706

Closed

2 tasks

snyk-bot mentioned this issue Jul 17, 2020

[Snyk] Security upgrade eslint from 3.19.0 to 4.0.0 adamlaska/node#38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect result for Buffer#toString #6075

Incorrect result for Buffer#toString #6075

kuldeepaggarwal commented Apr 6, 2016

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 6, 2016

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 6, 2016

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 7, 2016

Incorrect result for Buffer#toString #6075

Incorrect result for Buffer#toString #6075

Comments

kuldeepaggarwal commented Apr 6, 2016

Issue

Actual Use Case

Expected Result

Actual Result

Fix

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 6, 2016

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 6, 2016

bnoordhuis commented Apr 6, 2016

kuldeepaggarwal commented Apr 7, 2016