ARTEMIS-4910 fix divert config encoding #5079

jbertram · 2024-07-09T21:47:59Z

This commit fixes 4 distinct issues with divert configuration encoding:

The encoding size was calculated incorrectly in three ways:
- Using 0 instead of DataConstants.SIZE_NULL when no transformer is defined resulting in a calculated size discrepancy of -1.
- Using DataConstants.INT instead of DataConstants.SIZE_INT for the number of transformer properties resulting in a calculated size discrepancy of +2 when using a configuration with a transformer.
- Using BufferHelper.sizeOfNullableBoolean instead of DataConstants.SIZE_BOOLEAN for the exclusive property resulting in a calculated discrepancy of +1.
Encoding was using writeString instead of writeNullableString for the name of the transformer class resulting in an actual buffer size discrepancy of -1.

...r/src/test/java/org/apache/activemq/artemis/core/config/DivertConfigurationEncodingTest.java

gemmellr · 2024-07-15T10:30:01Z

Are there any compatibility implications to consider and either handle or document?

The JIRA looks to cover to an over-the-wire issue...any issues if e.g a rolling upgrade is tried with the different encoding behaviours? Should a particular rollout order be used?

Also, are these values persisted anywhere? If so is there any issue when an updated broker starts with an existing stored configuration value from an older version? EDIT: Seems they are from observing DivertConfigurationStorageTest

jbertram · 2024-07-15T15:01:12Z

I don't believe there will be any compatibility issues. To be clear, the changes only impact how a divert configuration is encoded. Nothing is changing with how it is decoded. This will mainly impact new brokers encoding new configurations. Any old configuration being read from disk or from the wire by a new broker will continue to work as it always had. Furthermore, a new broker sending a config over the wire to an old broker (e.g. in a HA configuration where the primary is upgraded first) will also work properly not least because now it's being encoded correctly.

gemmellr · 2024-07-15T17:02:29Z

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour. Either during decoding of old persisted data, or data generated afresh by an old broker. By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'? If so, that would still seems worth documenting somewhere, e.g jsut the Jira.

E.g if the calculated size has changed, which apparently is enough to fix a bug over the wire for live brokers, couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker? Or does it somehow not use that incorrect size of persisted data to decode it correctly, or refreshes it so it has the correct encoded size persisted, when it is a new broker reading it?

jbertram · 2024-07-15T18:45:11Z

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour.

The main thing that changed here is the calculation of the encoding size, a calculation which isn't performed during the decoding process.

The only actual encoding change was to fix a bug where encoding & decoding were using mismatched methods. Encoding was using writeString and decoding was using readNullableString. Technically speaking, this could have been fixed by changing the decoding to using readString, but that would have had implications for compatibility. Instead I changed the encoding to use writeNullableString which means that the data in the buffer is correct now and there should be no compatibility issues.

By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'?

If you have an old broker sending broken data to a new broker the new broker will choke on it. I don't see a reasonable way to avoid that. To clarify, are you saying that this fact needs to be documented?

...if the calculated size has changed...couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker?

The encoding size is calculated and written to disk (or wire) along with the record. The new calculation won't come into play when reading this old data. The new broker will read the data from disk just like the old broker did since there's been no changes to decoding.

gemmellr · 2024-07-16T10:24:19Z

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour.

The main thing that changed here is the calculation of the encoding size, a calculation which isn't performed during the decoding process.

The only actual encoding change was to fix a bug where encoding & decoding were using mismatched methods. Encoding was using writeString and decoding was using readNullableString. Technically speaking, this could have been fixed by changing the decoding to using readString, but that would have had implications for compatibility. Instead I changed the encoding to use writeNullableString which means that the data in the buffer is correct now and there should be no compatibility issues.

By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'?

If you have an old broker sending broken data to a new broker the new broker will choke on it. I don't see a reasonable way to avoid that. To clarify, are you saying that this fact needs to be documented?

...if the calculated size has changed...couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker?

The encoding size is calculated and written to disk (or wire) along with the record. The new calculation won't come into play when reading this old data. The new broker will read the data from disk just like the old broker did since there's been no changes to decoding.

Ok, so you have confirmed everything I thought would be the case. So yes, I think it is reasonable that is at least noted somewhere, even if just the JIRA. Someone who has hit this, upgrades their broker and has now a 'bug fixed' version, is likely to expect that completely resolves their issue. If it actually wont change anything until only data generated by new brokers is in play, it feels like that should be stated so the user can figure that out and learn what to do about it. E.g is it a case of deleting their store to remove the old persisted thing? Removing a specific config and recreating it? Other? I really dont know, hence the suggestion we should perhaps have a few word covering the scenario for anyone else that wouldnt immediately know.

This commit fixes 4 distinct issues with divert configuration encoding: - The encoding size was calculated incorrectly in three ways: - Using 0 instead of `DataConstants.SIZE_NULL` when no transformer is defined resulting in a calculated size discrepancy of -1. - Using `DataConstants.INT` instead of `DataConstants.SIZE_INT` for the number of transformer properties resulting in a calculated size discrepancy of +2 when using a configuration with a transformer. - Using `BufferHelper.sizeOfNullableBoolean` instead of `DataConstants.SIZE_BOOLEAN` for the `exclusive` property resulting in a calculated discrepancy of +1. - Encoding was using `writeString` instead of `writeNullableString` for the name of the transformer class resulting in an actual buffer size discrepancy of -1. Aside from these fixes this commit also: - Updates the divert storage test to force a reload from disk which will also trigger this problem. - Adds new tests specifically for encoding & decoding include a test to verify known bytes during encoding. Lastly, it's worth noting that this *won't fix* bad data that was already stored to disk by an older broker or bad data that comes over the wire from an older broker (e.g. if a older primary broker paired with a newer backup).

gemmellr reviewed Jul 15, 2024

View reviewed changes

...r/src/test/java/org/apache/activemq/artemis/core/config/DivertConfigurationEncodingTest.java Show resolved Hide resolved

jbertram force-pushed the ARTEMIS-4910 branch 2 times, most recently from 8183c0a to 42dba83 Compare July 15, 2024 16:35

jbertram force-pushed the ARTEMIS-4910 branch from 42dba83 to 17644f0 Compare July 16, 2024 19:50

jbertram force-pushed the ARTEMIS-4910 branch from 17644f0 to db1d013 Compare July 16, 2024 19:53

gemmellr merged commit 1a83220 into apache:main Jul 17, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARTEMIS-4910 fix divert config encoding #5079

ARTEMIS-4910 fix divert config encoding #5079

jbertram commented Jul 9, 2024 •

edited

Loading

gemmellr commented Jul 15, 2024 •

edited

Loading

jbertram commented Jul 15, 2024 •

edited

Loading

gemmellr commented Jul 15, 2024

jbertram commented Jul 15, 2024

gemmellr commented Jul 16, 2024

ARTEMIS-4910 fix divert config encoding #5079

ARTEMIS-4910 fix divert config encoding #5079

Conversation

jbertram commented Jul 9, 2024 • edited Loading

gemmellr commented Jul 15, 2024 • edited Loading

jbertram commented Jul 15, 2024 • edited Loading

gemmellr commented Jul 15, 2024

jbertram commented Jul 15, 2024

gemmellr commented Jul 16, 2024

jbertram commented Jul 9, 2024 •

edited

Loading

gemmellr commented Jul 15, 2024 •

edited

Loading

jbertram commented Jul 15, 2024 •

edited

Loading