You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An operator accidentally TRUNCATEs all registry data in Postgres
Kafka producers continue to run, and re-register the schemas during serialization
The "newly" registered schemas will have different IDs than their predecessors, even if their content is identical to previously seen versions
The error is detected, the registry is taken down for maintenance, the previous state of the registry is restored from backup
The producers continue to add the pre-incident schema IDs to the records
Result: The data produced during the incident will become incomprehensible to consumers after recovery. (If at all possible) affected producers will need to be rewound to a point in time that precedes the incident's start so they can re-emit records with correct schema IDs.
Proposed Solution
If the registry could uniquely identify and store schemas based on their content (e.g. by using the schema hash as an ID), the IDs generated during the incident would match those that existed before the data corruption. As a result, the data that would get produced and serialized during the incident would continue to make sense to consumers even after recovery.
Additional Context
Is there an existing approach for managing the risks of a temporary corruption of registry state?
The text was updated successfully, but these errors were encountered:
There is, depending on what you want to achieve, an option might be to use just the topic id strategy, so, even if a new schema is registered, the resolution will still use the correct one. We also have a strategy to use a contentHash identifier for the content that is calculated purely on the content itself, so that would be another alternative. Now looking at it it's not well documented, so let me do some improvements there and I'll add them here.
Feature or Problem Description
Imagine the following scenario:
TRUNCATE
s all registry data in PostgresResult: The data produced during the incident will become incomprehensible to consumers after recovery. (If at all possible) affected producers will need to be rewound to a point in time that precedes the incident's start so they can re-emit records with correct schema IDs.
Proposed Solution
If the registry could uniquely identify and store schemas based on their content (e.g. by using the schema hash as an ID), the IDs generated during the incident would match those that existed before the data corruption. As a result, the data that would get produced and serialized during the incident would continue to make sense to consumers even after recovery.
Additional Context
Is there an existing approach for managing the risks of a temporary corruption of registry state?
The text was updated successfully, but these errors were encountered: