Skip to content

Commit

Permalink
tests: accomodate some messages that can fail tests (#8144)
Browse files Browse the repository at this point in the history
## Problem

- `test_storage_controller_many_tenants` can fail with warnings in the
storage controller about tenant creation holding a lock for too long,
because this test stresses the machine running the test with many
concurrent timeline creations
- `test_tenant_delete_smoke` can fail when synthetic remote storage
errors show up

## Summary of changes

- tolerate warnings about slow timeline creation in
test_storage_controller_many_tenants
- tolerate both possible errors during error_tolerant_delete
  • Loading branch information
jcsp committed Jun 24, 2024
1 parent 3d76093 commit 1ea5d8b
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 3 deletions.
11 changes: 10 additions & 1 deletion test_runner/performance/test_storage_controller_scale.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,16 @@ def test_storage_controller_many_tenants(

# We will intentionally stress reconciler concurrrency, which triggers a warning when lots
# of shards are hitting the delayed path.
env.storage_controller.allowed_errors.append(".*Many shards are waiting to reconcile")
env.storage_controller.allowed_errors.extend(
[
# We will intentionally stress reconciler concurrrency, which triggers a warning when lots
# of shards are hitting the delayed path.
".*Many shards are waiting to reconcile",
# We will create many timelines concurrently, so they might get slow enough to trip the warning
# that timeline creation is holding a lock too long.
".*Shared lock by TimelineCreate.*was held.*",
]
)

for ps in env.pageservers:
# This can happen because when we do a loop over all pageservers and mark them offline/active,
Expand Down
8 changes: 6 additions & 2 deletions test_runner/regress/test_tenant_delete.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,12 @@ def error_tolerant_delete(ps_http, tenant_id):
if e.status_code == 500:
# This test uses failure injection, which can produce 500s as the pageserver expects
# the object store to always be available, and the ListObjects during deletion is generally
# an infallible operation
assert "simulated failure of remote operation" in e.message
# an infallible operation. This can show up as a clear simulated error, or as a general
# error during delete_objects()
assert (
"simulated failure of remote operation" in e.message
or "failed to delete" in e.message
)
else:
raise
else:
Expand Down

1 comment on commit 1ea5d8b

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3004 tests run: 2878 passed, 0 failed, 126 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_scrubber_tenant_snapshot[4]: release

Code coverage* (full report)

  • functions: 32.6% (6883 of 21131 functions)
  • lines: 50.2% (53670 of 106963 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
1ea5d8b at 2024-06-24T18:29:40.538Z :recycle:

Please sign in to comment.