IncludeExclude does not need formatter when converting longs #38739

polyfractal · 2019-02-11T16:05:31Z

Using the formatter converts longs (wrapped in a BytesRef) to a double, then rounds/truncates back to a long. This loses precision for large longs.

I might be missing something, but I think we should instead just convert the string value directly to a long.

The disadvantage is that a floating point value will throw a NumberFormatException, but it looks like this shouldn't be a problem. The only two usages are by terms and sigterms aggs, and both check if the number is a float first and use convertToDoubleFilter().

We could fall back to the old behavior if a NumberFormatException is thrown, but this feels wrong to me since we should only be dealing with longs in the long filter, and silently truncating precision is worse than just throwing an exception imo.

Closes #38692

Using the formatter converts longs (inside a BytesRef) to a double, then rounds/truncates back to a long. This loses precision for large longs. We should instead just convert the string value directly to a long.

elasticmachine · 2019-02-11T16:05:34Z

Pinging @elastic/es-analytics-geo

Non-RAW formats (Dates, Decimal, etc) need the formatter for converting (e.g. a date string to long)

polyfractal · 2019-02-11T22:14:10Z

Heh, well, I see what I was missing: dates (and presumably IPs, decimals, etc) fail due to NumberFormatExceptions now.

I pushed a fix where we fall back to the formatter for non-RAW formatters. It's not super elegant, but I wanted to minimize the fallout from a change like this. Open to suggestions.

cyberhuman · 2019-02-12T01:01:53Z

Shouldn't RAW formatter's parseLong be updated instead? I'm not sure where else it's used, but I imagine that its users may expect 64-bit long values to be supported, don't they?

polyfractal · 2019-02-12T02:05:56Z

Possibly! I'm not crazy about the currently proposed PR, but admit that I was partially waiting to see if Adrien had an opinion when he got back as well :)

I feel like there may be some context here that I'm missing. E.g. I suspect there is (or potentially was but has since been removed) another place where floats are being fed into parseLong() and the truncation behavior is expected/desired.

The current setup of converting all the values to BytesRef, and then relying on a semi-generic formatter (RAW includes both floats and natural numbers) to parse those values back into longs/doubles seems non-ideal in general. I wonder if we can get rid of this abstraction and just wrap the ValuesSource to do the filtering? Or perhaps differentiate floats vs natural numbers in the formatters?

polyfractal · 2019-02-15T14:57:36Z

Thought about this a bit more, and I think it needs a more thorough refactor than just a simple bandaid patch (not a ton of work, but something a bit more involved). Gonna close this PR since I don't have bandwidth right now to fix it up, but will try to get back to it soon :)

$polyfractal$

$@polyfractal$ polyfractal added >bug :Analytics/Aggregations Aggregations labels Feb 11, 2019

$@polyfractal$ polyfractal requested a review from jpountz February 11, 2019 16:05

$@polyfractal$

Differentiate from RAW and other DVFormats when converting

efc4311

Non-RAW formats (Dates, Decimal, etc) need the formatter for converting (e.g. a date string to long)

$@polyfractal$ polyfractal closed this Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IncludeExclude does not need formatter when converting longs #38739

IncludeExclude does not need formatter when converting longs #38739

$@polyfractal$ polyfractal commented Feb 11, 2019

elasticmachine commented Feb 11, 2019

polyfractal commented Feb 11, 2019

cyberhuman commented Feb 12, 2019

polyfractal commented Feb 12, 2019

polyfractal commented Feb 15, 2019

IncludeExclude does not need formatter when converting longs #38739

IncludeExclude does not need formatter when converting longs #38739

Conversation

polyfractal commented Feb 11, 2019

elasticmachine commented Feb 11, 2019

polyfractal commented Feb 11, 2019

cyberhuman commented Feb 12, 2019

polyfractal commented Feb 12, 2019

polyfractal commented Feb 15, 2019

$@polyfractal$ polyfractal commented Feb 11, 2019