Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: max_lsn_wal_lag broken in tenant conf #4279

Merged
merged 2 commits into from
May 23, 2023

Conversation

problame
Copy link
Contributor

This patch fixes parsing of the max_lsn_wal_lag tenant config item.
We were incorrectly expecting a string before, but the type is a NonZeroU64.

So, when setting it in the config, the (updated) test case would fail with

 E       psycopg2.errors.InternalError_: Tenant a1fa9cc383e32ddafb73ff920de5f2e6 will not become active. Current state: Broken due to: Failed to parse config from file '.../repo/tenants/a1fa9cc383e32ddafb73ff920de5f2e6/config' as pageserver config: configure option max_lsn_wal_lag is not a string. Backtrace:

So, not even the assertions added are necessary.

The test coverage for tenant config is rather thin in general.
For example, the test_tenant_conf.py test doesn't cover all the options.

I'll add a new regression test as part of attach-time-tenant-conf PR #4255

fails with

E       psycopg2.errors.InternalError_: Tenant a1fa9cc383e32ddafb73ff920de5f2e6 will not become active. Current state: Broken due to: Failed to parse config from file '/home/cs/src/neon/test_output/test_tenant_config[debug-pg14]/repo/tenants/a1fa9cc383e32ddafb73ff920de5f2e6/config' as pageserver config: configure option max_lsn_wal_lag is not a string. Backtrace:

So, not even the assertions added are necessary.
But let's have them anyway.
@github-actions
Copy link

992 tests run: 951 passed, 0 failed, 41 skipped (full report)


The comment gets automatically updated with the latest test results
9204bb3 at 2023-05-19T12:01:49.113Z :recycle:

@problame problame marked this pull request as ready for review May 19, 2023 12:33
@problame problame requested review from a team as code owners May 19, 2023 12:33
@problame problame requested review from knizhnik and removed request for a team May 19, 2023 12:33
Copy link
Contributor

@LizardWizzard LizardWizzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

While looking at other integer-ish parameters there seems to be some inconsistency. checkpoint_distance is parsed using parse_toml_u64 and so many others. I tried to replace parse_toml_u64 with the same deserialize_from_item and test passed for me. So may make sense to unify those and possibly remove parse_toml_u64. Not requiring these changes to be made, up to you, feel free to merge as it is

@problame problame removed the request for review from knizhnik May 19, 2023 13:30
@problame problame changed the base branch from main to bayandin/fix-postgis-downloading May 19, 2023 13:31
@problame problame self-assigned this May 19, 2023
Base automatically changed from bayandin/fix-postgis-downloading to main May 19, 2023 14:34
@problame problame enabled auto-merge (squash) May 19, 2023 14:40
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counting the days we can make this code generation based :)

@problame
Copy link
Contributor Author

I'm waiting for e2e build failures to be resolved, see https://neondb.slack.com/archives/C03438W3FLZ/p1684747576440749 to

@problame problame merged commit 4d41b2d into main May 23, 2023
@problame problame deleted the problame/fix-tenant-conf-max-wal-lag branch May 23, 2023 13:30
problame added a commit that referenced this pull request May 24, 2023
This PR adds support for supplying the tenant config upon /attach.

Before this change, when relocating a tenant using `/detach` and
`/attach`, the tenant config after `/attach` would be the default config
from `pageserver.toml`.
That is undesirable for settings such as the PITR-interval: if the
tenant's config on the source was `30 days` and the default config on
the attach-side is `7 days`, then the first GC run would eradicate 23
days worth of PITR capability.

The API change is backwards-compatible: if the body is empty, we
continue to use the default config.
We'll remove that capability as soon as the cloud.git code is updated to
use attach-time tenant config
(#4282 keeps track of this).

unblocks neondatabase/cloud#5092 
fixes #1555
part of #886 (Tenant
Relocation)

Implementation
==============

The preliminary PRs for this work were (most-recent to least-recent)

* #4279
* #4267
* #4252
* #4235
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants