Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARTEMIS-4910 fix divert config encoding #5079

Merged
merged 1 commit into from
Jul 17, 2024
Merged

Conversation

jbertram
Copy link
Contributor

@jbertram jbertram commented Jul 9, 2024

This commit fixes 4 distinct issues with divert configuration encoding:

  • The encoding size was calculated incorrectly in three ways:
    • Using 0 instead of DataConstants.SIZE_NULL when no transformer is defined resulting in a calculated size discrepancy of -1.
    • Using DataConstants.INT instead of DataConstants.SIZE_INT for the number of transformer properties resulting in a calculated size discrepancy of +2 when using a configuration with a transformer.
    • Using BufferHelper.sizeOfNullableBoolean instead of DataConstants.SIZE_BOOLEAN for the exclusive property resulting in a calculated discrepancy of +1.
  • Encoding was using writeString instead of writeNullableString for the name of the transformer class resulting in an actual buffer size discrepancy of -1.

@gemmellr
Copy link
Member

gemmellr commented Jul 15, 2024

Are there any compatibility implications to consider and either handle or document?

The JIRA looks to cover to an over-the-wire issue...any issues if e.g a rolling upgrade is tried with the different encoding behaviours? Should a particular rollout order be used?

Also, are these values persisted anywhere? If so is there any issue when an updated broker starts with an existing stored configuration value from an older version? EDIT: Seems they are from observing DivertConfigurationStorageTest

@jbertram
Copy link
Contributor Author

jbertram commented Jul 15, 2024

I don't believe there will be any compatibility issues. To be clear, the changes only impact how a divert configuration is encoded. Nothing is changing with how it is decoded. This will mainly impact new brokers encoding new configurations. Any old configuration being read from disk or from the wire by a new broker will continue to work as it always had. Furthermore, a new broker sending a config over the wire to an old broker (e.g. in a HA configuration where the primary is upgraded first) will also work properly not least because now it's being encoded correctly.

@jbertram jbertram force-pushed the ARTEMIS-4910 branch 2 times, most recently from 8183c0a to 42dba83 Compare July 15, 2024 16:35
@gemmellr
Copy link
Member

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour. Either during decoding of old persisted data, or data generated afresh by an old broker. By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'? If so, that would still seems worth documenting somewhere, e.g jsut the Jira.

E.g if the calculated size has changed, which apparently is enough to fix a bug over the wire for live brokers, couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker? Or does it somehow not use that incorrect size of persisted data to decode it correctly, or refreshes it so it has the correct encoded size persisted, when it is a new broker reading it?

@jbertram
Copy link
Contributor Author

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour.

The main thing that changed here is the calculation of the encoding size, a calculation which isn't performed during the decoding process.

The only actual encoding change was to fix a bug where encoding & decoding were using mismatched methods. Encoding was using writeString and decoding was using readNullableString. Technically speaking, this could have been fixed by changing the decoding to using readString, but that would have had implications for compatibility. Instead I changed the encoding to use writeNullableString which means that the data in the buffer is correct now and there should be no compatibility issues.

By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'?

If you have an old broker sending broken data to a new broker the new broker will choke on it. I don't see a reasonable way to avoid that. To clarify, are you saying that this fact needs to be documented?

...if the calculated size has changed...couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker?

The encoding size is calculated and written to disk (or wire) along with the record. The new calculation won't come into play when reading this old data. The new broker will read the data from disk just like the old broker did since there's been no changes to decoding.

@gemmellr
Copy link
Member

I find it awkward to reconcile the encoding having changed, with there being no consideration needed at all in terms of upgrade behaviour.

The main thing that changed here is the calculation of the encoding size, a calculation which isn't performed during the decoding process.

The only actual encoding change was to fix a bug where encoding & decoding were using mismatched methods. Encoding was using writeString and decoding was using readNullableString. Technically speaking, this could have been fixed by changing the decoding to using readString, but that would have had implications for compatibility. Instead I changed the encoding to use writeNullableString which means that the data in the buffer is correct now and there should be no compatibility issues.

By 'continue to work as it always had', do you for example mean 'continue to behave incorrectly, until only data generated by new brokers exists'?

If you have an old broker sending broken data to a new broker the new broker will choke on it. I don't see a reasonable way to avoid that. To clarify, are you saying that this fact needs to be documented?

...if the calculated size has changed...couldnt that have similar effect during the access of old persisted data with the wrong encoded size by a newer broker?

The encoding size is calculated and written to disk (or wire) along with the record. The new calculation won't come into play when reading this old data. The new broker will read the data from disk just like the old broker did since there's been no changes to decoding.

Ok, so you have confirmed everything I thought would be the case. So yes, I think it is reasonable that is at least noted somewhere, even if just the JIRA. Someone who has hit this, upgrades their broker and has now a 'bug fixed' version, is likely to expect that completely resolves their issue. If it actually wont change anything until only data generated by new brokers is in play, it feels like that should be stated so the user can figure that out and learn what to do about it. E.g is it a case of deleting their store to remove the old persisted thing? Removing a specific config and recreating it? Other? I really dont know, hence the suggestion we should perhaps have a few word covering the scenario for anyone else that wouldnt immediately know.

This commit fixes 4 distinct issues with divert configuration encoding:

 - The encoding size was calculated incorrectly in three ways:
   - Using 0 instead of `DataConstants.SIZE_NULL` when no transformer is
     defined resulting in a calculated size discrepancy of -1.
   - Using `DataConstants.INT` instead of `DataConstants.SIZE_INT` for
     the number of transformer properties resulting in a calculated size
     discrepancy of +2 when using a configuration with a transformer.
   - Using `BufferHelper.sizeOfNullableBoolean` instead of
     `DataConstants.SIZE_BOOLEAN` for the `exclusive` property resulting
     in a calculated discrepancy of +1.
 - Encoding was using `writeString` instead of `writeNullableString` for
   the name of the transformer class resulting in an actual buffer size
   discrepancy of -1.

Aside from these fixes this commit also:

 - Updates the divert storage test to force a reload from disk which
   will also trigger this problem.
 - Adds new tests specifically for encoding & decoding include a test to
   verify known bytes during encoding.

Lastly, it's worth noting that this *won't fix* bad data that was
already stored to disk by an older broker or bad data that comes over
the wire from an older broker (e.g. if a older primary broker paired
with a newer backup).
@gemmellr gemmellr merged commit 1a83220 into apache:main Jul 17, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants