Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tapgarden: improve fault injection in unit tests #1031

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

jharveyb
Copy link
Collaborator

Addresses #974 .

The failure observed in CI was an empty confirmation event reached the caretaker in the test basic_asset_creation. That test case doesn't try to cause any faults in any mock interface, but finalize_batch does request that failure here:

// confirmation event to be empty.

I concluded that the way we had implemented this fault injection was racy, so here we use atomic Bools for tracking that state, that get unset after use.

I cannot reproduce the original failures locally after 100 runs of the unit-race pkg=tapgarden target.

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for the fix. LGTM, been running it on flake hunter for 20 minutes now without any failure.

tapgarden/mock.go Outdated Show resolved Hide resolved
tapgarden/mock.go Outdated Show resolved Hide resolved
@gijswijs
Copy link
Contributor

I managed to get it to fail, using the flake hunter slightly adjusted to only run the tapgarden package:

2024-07-19 12:39:20.315 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateBroadcast -> BatchStateBroadcast
2024-07-19 12:39:20.406 [INF] GRDN: MintingBatch(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): confirmed at block(hash=0000000000000000000000000000000000000000000000000000000000000000, height=1)
2024-07-19 12:39:20.406 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a), advancing from state=BatchStateConfirmed to state=BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateConfirmed -> BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateFinalized -> BatchStateFinalized
2024-07-19 12:39:20.479 [INF] GRDN: ChainCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a) has finished
2024-07-19 12:39:20.479 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): Stopping
--- FAIL: TestBatchedAssetIssuance (27.02s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/001/tmp.db
    --- FAIL: TestBatchedAssetIssuance/basic_asset_creation (10.61s)
        planter_test.go:1162: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/planter_test.go:945
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:1162
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:2056
                Error:          Received unexpected error:
                                predicate not satisfied after time out
                Test:           TestBatchedAssetIssuance/basic_asset_creation
                Messages:       unable to read pending batch: predicate not satisfied after time out
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/002/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/003/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/004/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/005/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/006/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/007/tmp.db
2024-07-19 12:39:20.806 [INF] GRDN: Starting re-org watcher

It's the same test but now it fails with a different error. Not sure what to make of this. Will try to run again, see if I can reproduce the error.

@gijswijs
Copy link
Contributor

And this one as well:

==================
WARNING: DATA RACE
Write at 0x00c000a36040 by goroutine 7528:
  github.com/lightningnetwork/lnd/lntest/wait.NoError.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:52 +0x50
  github.com/lightningnetwork/lnd/lntest/wait.Predicate.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:27 +0x33

Previous read at 0x00c000a36040 by goroutine 313:
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:63 +0x124
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 7528 (running) created at:
  github.com/lightningnetwork/lnd/lntest/wait.Predicate()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:26 +0x147
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:60 +0x111
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 313 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:63 +0x2bd
==================
--- FAIL: TestTransactionConfirmedOnly (5.18s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/001/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/002/tmp.db
    custodian_test.go:221: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:221
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676
                Error:          Received unexpected error:
                                event 0 has status 1 but wanted 3
                Test:           TestTransactionConfirmedOnly
    testing.go:1398: race detected during execution of test
FAIL
FAIL    github.com/lightninglabs/taproot-assets/tapgarden       35.844s
FAIL
make: *** [Makefile:229: flake-unit-race] Error 123

@dstadulis dstadulis added this to the v0.4.1 milestone Jul 22, 2024
@jharveyb
Copy link
Collaborator Author

I managed to get it to fail, using the flake hunter slightly adjusted to only run the tapgarden package:

2024-07-19 12:39:20.315 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateBroadcast -> BatchStateBroadcast
2024-07-19 12:39:20.406 [INF] GRDN: MintingBatch(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): confirmed at block(hash=0000000000000000000000000000000000000000000000000000000000000000, height=1)
2024-07-19 12:39:20.406 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a), advancing from state=BatchStateConfirmed to state=BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateConfirmed -> BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateFinalized -> BatchStateFinalized
2024-07-19 12:39:20.479 [INF] GRDN: ChainCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a) has finished
2024-07-19 12:39:20.479 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): Stopping
--- FAIL: TestBatchedAssetIssuance (27.02s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/001/tmp.db
    --- FAIL: TestBatchedAssetIssuance/basic_asset_creation (10.61s)
        planter_test.go:1162: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/planter_test.go:945
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:1162
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:2056
                Error:          Received unexpected error:
                                predicate not satisfied after time out
                Test:           TestBatchedAssetIssuance/basic_asset_creation
                Messages:       unable to read pending batch: predicate not satisfied after time out
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/002/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/003/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/004/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/005/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/006/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/007/tmp.db
2024-07-19 12:39:20.806 [INF] GRDN: Starting re-org watcher

I hit this once and wasn't sure what to make of it either - bumping the timeout helped but that test case isn't using any of these functions to try and cause faults.

@jharveyb
Copy link
Collaborator Author

And this one as well:

==================
WARNING: DATA RACE
Write at 0x00c000a36040 by goroutine 7528:
  github.com/lightningnetwork/lnd/lntest/wait.NoError.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:52 +0x50
  github.com/lightningnetwork/lnd/lntest/wait.Predicate.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:27 +0x33

Previous read at 0x00c000a36040 by goroutine 313:
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:63 +0x124
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 7528 (running) created at:
  github.com/lightningnetwork/lnd/lntest/wait.Predicate()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:26 +0x147
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:60 +0x111
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 313 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:63 +0x2bd
==================
--- FAIL: TestTransactionConfirmedOnly (5.18s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/001/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/002/tmp.db
    custodian_test.go:221: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:221
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676
                Error:          Received unexpected error:
                                event 0 has status 1 but wanted 3
                Test:           TestTransactionConfirmedOnly
    testing.go:1398: race detected during execution of test
FAIL
FAIL    github.com/lightninglabs/taproot-assets/tapgarden       35.844s
FAIL
make: *** [Makefile:229: flake-unit-race] Error 123

Nice catch, I've seen this one before but I don't see an open issue for it.

In this commit, we use atomic variables in the unit test mock interfaces
to store state about requested failues. We also link requested
confirmation failures to a specific conf request. Only a single failure
event can be requested at a time, and the mock interfaces reset after
that event. This should reduce state being unintentionally shared
between test cases.
Copy link
Contributor

@gijswijs gijswijs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those racy errors sure are finicky. I like the smart solution.
I can only reproduce the timeout, but that seems unrelated. Will make an issue for the other error.

@guggero guggero merged commit 109d86f into main Jul 23, 2024
16 checks passed
@guggero guggero deleted the minter_unit_race_fixes branch July 23, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants