Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IncludeExclude does not need formatter when converting longs #38739

Closed

Conversation

polyfractal
Copy link
Contributor

Using the formatter converts longs (wrapped in a BytesRef) to a double, then rounds/truncates back to a long. This loses precision for large longs.

I might be missing something, but I think we should instead just convert the string value directly to a long.

The disadvantage is that a floating point value will throw a NumberFormatException, but it looks like this shouldn't be a problem. The only two usages are by terms and sigterms aggs, and both check if the number is a float first and use convertToDoubleFilter().

We could fall back to the old behavior if a NumberFormatException is thrown, but this feels wrong to me since we should only be dealing with longs in the long filter, and silently truncating precision is worse than just throwing an exception imo.

Closes #38692

Using the formatter converts longs (inside a BytesRef) to a double,
then rounds/truncates back to a long.  This loses precision for large
longs.

We should instead just convert the string value directly to a long.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

Non-RAW formats (Dates, Decimal, etc) need the formatter for converting
(e.g. a date string to long)
@polyfractal
Copy link
Contributor Author

Heh, well, I see what I was missing: dates (and presumably IPs, decimals, etc) fail due to NumberFormatExceptions now.

I pushed a fix where we fall back to the formatter for non-RAW formatters. It's not super elegant, but I wanted to minimize the fallout from a change like this. Open to suggestions.

@cyberhuman
Copy link

Shouldn't RAW formatter's parseLong be updated instead? I'm not sure where else it's used, but I imagine that its users may expect 64-bit long values to be supported, don't they?

@polyfractal
Copy link
Contributor Author

Possibly! I'm not crazy about the currently proposed PR, but admit that I was partially waiting to see if Adrien had an opinion when he got back as well :)

I feel like there may be some context here that I'm missing. E.g. I suspect there is (or potentially was but has since been removed) another place where floats are being fed into parseLong() and the truncation behavior is expected/desired.

The current setup of converting all the values to BytesRef, and then relying on a semi-generic formatter (RAW includes both floats and natural numbers) to parse those values back into longs/doubles seems non-ideal in general. I wonder if we can get rid of this abstraction and just wrap the ValuesSource to do the filtering? Or perhaps differentiate floats vs natural numbers in the formatters?

@polyfractal
Copy link
Contributor Author

Thought about this a bit more, and I think it needs a more thorough refactor than just a simple bandaid patch (not a ton of work, but something a bit more involved). Gonna close this PR since I don't have bandwidth right now to fix it up, but will try to get back to it soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants