Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support changing IndexPart::metadata_bytes to json in future release #7693

Merged
merged 8 commits into from
Jun 4, 2024

Conversation

koivunej
Copy link
Member

@koivunej koivunej commented May 10, 2024

Problem

Currently we serialize the TimelineMetadata into bytes to put it into index_part.json. This Vec<u8> (hopefully [u8; 512]) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic.

We've removed the pageserver local metadata file writing in #6769.

Summary of changes

Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (crate::tenant::metadata::modern_serde), which accepts both old bytes and new plain JSON.

The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like:

{
  "version": ???,
  "layer_metadata": { ... },
  "disk_consistent_lsn": "0/15FD5D8",
  "legacy_metadata": {
    "disk_consistent_lsn": "0/15FD5D8",
    "prev_record_lsn": "0/15FD5A0",
    "ancestor_timeline": null,
    "ancestor_lsn": "0/0",
    "latest_gc_cutoff_lsn": "0/149FD18",
    "initdb_lsn": "0/149FD18",
    "pg_version": 15
  }
}

This is an alternative to #7663, which still uses the Vec<u8> idea, but this time with mixed bincode+json instead of just bincode.

In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in index_part.json, and go ahead with storing metadata or feature flags on the index_part.json itself.

#7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.

@koivunej koivunej force-pushed the joonas/legacy_metadata_as_json_in_future branch from 4e1d86f to 63e3425 Compare May 10, 2024 11:44
@koivunej koivunej changed the title fix: support changing IndexPart::metadata_bytes to ::legacy_metadata in future release fix: support changing IndexPart::metadata_bytes to json in future release May 10, 2024
@koivunej koivunej changed the title fix: support changing IndexPart::metadata_bytes to json in future release feat: support changing IndexPart::metadata_bytes to json in future release May 10, 2024
Copy link

github-actions bot commented May 10, 2024

3186 tests run: 3047 passed, 0 failed, 139 skipped (full report)


Flaky tests (2)

Postgres 14

  • test_pageserver_restarts_under_worload: release
  • test_peer_recovery: debug

Code coverage* (full report)

  • functions: 31.5% (6590 of 20918 functions)
  • lines: 48.4% (50913 of 105084 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
44d4146 at 2024-06-04T15:23:26.411Z :recycle:

pageserver/src/tenant/metadata.rs Show resolved Hide resolved
@koivunej koivunej force-pushed the joonas/legacy_metadata_as_json_in_future branch from 6970bde to 906ee84 Compare June 4, 2024 14:28
@koivunej
Copy link
Member Author

koivunej commented Jun 4, 2024

We had some discussion perhaps last week if we want to keep the checksum or not in the json. The next PR will include refactoring which makes it trivial to include but, I'd rather not reorder the commits.

@koivunej koivunej merged commit 3d6e389 into main Jun 4, 2024
64 checks passed
@koivunej koivunej deleted the joonas/legacy_metadata_as_json_in_future branch June 4, 2024 16:36
koivunej added a commit that referenced this pull request Jun 11, 2024
We've stored metadata as bytes within the `index_part.json` for 
long fixed reasons. #7693 added support for reading out normal json
serialization of the `TimelineMetadata`.

Change the serialization to only write `TimelineMetadata` as json for
going forward, keeping the backward compatibility to reading the
metadata as bytes. Because of failure to include `alias = "metadata"` in
#7693, one more follow-up is required to make the switch from the old
name to `"metadata": <json>`, but that affects only the field name in
serialized format.

In documentation and naming, an effort is made to add enough warning
signs around TimelineMetadata so that it will receive no changes in the
future. We can add those fields to `IndexPart` directly instead.

Additionally, the path to cleaning up `metadata.rs` is documented in the
`metadata.rs` module comment. If we must extend `TimelineMetadata`
before that, the duplication suggested in [review comment] is the way to
go.

[review comment]:
#7699 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants