tapgarden: improve fault injection in unit tests #1031

jharveyb · 2024-07-18T16:58:50Z

Addresses #974 .

The failure observed in CI was an empty confirmation event reached the caretaker in the test basic_asset_creation. That test case doesn't try to cause any faults in any mock interface, but finalize_batch does request that failure here:

taproot-assets/tapgarden/planter_test.go

Line 1484 in 395a180

// confirmation event to be empty.

I concluded that the way we had implemented this fault injection was racy, so here we use atomic Bools for tracking that state, that get unset after use.

I cannot reproduce the original failures locally after 100 runs of the unit-race pkg=tapgarden target.

guggero

Very nice, thanks for the fix. LGTM, been running it on flake hunter for 20 minutes now without any failure.

tapgarden/mock.go

gijswijs · 2024-07-19T10:58:59Z

I managed to get it to fail, using the flake hunter slightly adjusted to only run the tapgarden package:

2024-07-19 12:39:20.315 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateBroadcast -> BatchStateBroadcast
2024-07-19 12:39:20.406 [INF] GRDN: MintingBatch(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): confirmed at block(hash=0000000000000000000000000000000000000000000000000000000000000000, height=1)
2024-07-19 12:39:20.406 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a), advancing from state=BatchStateConfirmed to state=BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateConfirmed -> BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateFinalized -> BatchStateFinalized
2024-07-19 12:39:20.479 [INF] GRDN: ChainCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a) has finished
2024-07-19 12:39:20.479 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): Stopping
--- FAIL: TestBatchedAssetIssuance (27.02s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/001/tmp.db
    --- FAIL: TestBatchedAssetIssuance/basic_asset_creation (10.61s)
        planter_test.go:1162: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/planter_test.go:945
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:1162
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:2056
                Error:          Received unexpected error:
                                predicate not satisfied after time out
                Test:           TestBatchedAssetIssuance/basic_asset_creation
                Messages:       unable to read pending batch: predicate not satisfied after time out
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/002/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/003/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/004/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/005/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/006/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/007/tmp.db
2024-07-19 12:39:20.806 [INF] GRDN: Starting re-org watcher

It's the same test but now it fails with a different error. Not sure what to make of this. Will try to run again, see if I can reproduce the error.

gijswijs · 2024-07-19T11:24:16Z

And this one as well:

==================
WARNING: DATA RACE
Write at 0x00c000a36040 by goroutine 7528:
  github.com/lightningnetwork/lnd/lntest/wait.NoError.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:52 +0x50
  github.com/lightningnetwork/lnd/lntest/wait.Predicate.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:27 +0x33

Previous read at 0x00c000a36040 by goroutine 313:
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:63 +0x124
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 7528 (running) created at:
  github.com/lightningnetwork/lnd/lntest/wait.Predicate()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:26 +0x147
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:60 +0x111
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 313 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:63 +0x2bd
==================
--- FAIL: TestTransactionConfirmedOnly (5.18s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/001/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/002/tmp.db
    custodian_test.go:221: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:221
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676
                Error:          Received unexpected error:
                                event 0 has status 1 but wanted 3
                Test:           TestTransactionConfirmedOnly
    testing.go:1398: race detected during execution of test
FAIL
FAIL    github.com/lightninglabs/taproot-assets/tapgarden       35.844s
FAIL
make: *** [Makefile:229: flake-unit-race] Error 123

jharveyb · 2024-07-22T20:42:43Z

I managed to get it to fail, using the flake hunter slightly adjusted to only run the tapgarden package:

2024-07-19 12:39:20.315 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateBroadcast -> BatchStateBroadcast
2024-07-19 12:39:20.406 [INF] GRDN: MintingBatch(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): confirmed at block(hash=0000000000000000000000000000000000000000000000000000000000000000, height=1)
2024-07-19 12:39:20.406 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a), advancing from state=BatchStateConfirmed to state=BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateConfirmed -> BatchStateFinalized
2024-07-19 12:39:20.471 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): transition states: BatchStateFinalized -> BatchStateFinalized
2024-07-19 12:39:20.479 [INF] GRDN: ChainCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a) has finished
2024-07-19 12:39:20.479 [INF] GRDN: BatchCaretaker(03303a5f92f711ac7e3fb1deb96d2e011caced3f1776ad839b5e5dd5fbaca98d6a): Stopping
--- FAIL: TestBatchedAssetIssuance (27.02s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/001/tmp.db
    --- FAIL: TestBatchedAssetIssuance/basic_asset_creation (10.61s)
        planter_test.go:1162: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/planter_test.go:945
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:1162
                                                        /home/gijs/git/taproot-assets/tapgarden/planter_test.go:2056
                Error:          Received unexpected error:
                                predicate not satisfied after time out
                Test:           TestBatchedAssetIssuance/basic_asset_creation
                Messages:       unable to read pending batch: predicate not satisfied after time out
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/002/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/003/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/004/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/005/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/006/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestBatchedAssetIssuance2437451428/007/tmp.db
2024-07-19 12:39:20.806 [INF] GRDN: Starting re-org watcher

I hit this once and wasn't sure what to make of it either - bumping the timeout helped but that test case isn't using any of these functions to try and cause faults.

jharveyb · 2024-07-22T20:53:51Z

And this one as well:

==================
WARNING: DATA RACE
Write at 0x00c000a36040 by goroutine 7528:
  github.com/lightningnetwork/lnd/lntest/wait.NoError.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:52 +0x50
  github.com/lightningnetwork/lnd/lntest/wait.Predicate.func1()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:27 +0x33

Previous read at 0x00c000a36040 by goroutine 313:
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:63 +0x124
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 7528 (running) created at:
  github.com/lightningnetwork/lnd/lntest/wait.Predicate()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:26 +0x147
  github.com/lightningnetwork/lnd/lntest/wait.NoError()
      /home/gijs/go/pkg/mod/github.com/lightningnetwork/lnd@v0.18.0-beta.rc4.0.20240709210732-0cd11ad84792/lntest/wait/wait.go:60 +0x111
  github.com/lightninglabs/taproot-assets/tapgarden_test.(*custodianHarness).assertEventsPresent()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:196 +0x24e
  github.com/lightninglabs/taproot-assets/tapgarden_test.runTransactionConfirmedOnlyTest()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746 +0x80d
  github.com/lightninglabs/taproot-assets/tapgarden_test.TestTransactionConfirmedOnly()
      /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676 +0x44
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x44

Goroutine 313 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      /usr/local/go/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:2027 +0xf17
  main.main()
      _testmain.go:63 +0x2bd
==================
--- FAIL: TestTransactionConfirmedOnly (5.18s)
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/001/tmp.db
    test_sqlite.go:11: Creating new SQLite DB handle for testing: /tmp/TestTransactionConfirmedOnly2822250713/002/tmp.db
    custodian_test.go:221: 
                Error Trace:    /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:221
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:746
                                                        /home/gijs/git/taproot-assets/tapgarden/custodian_test.go:676
                Error:          Received unexpected error:
                                event 0 has status 1 but wanted 3
                Test:           TestTransactionConfirmedOnly
    testing.go:1398: race detected during execution of test
FAIL
FAIL    github.com/lightninglabs/taproot-assets/tapgarden       35.844s
FAIL
make: *** [Makefile:229: flake-unit-race] Error 123

Nice catch, I've seen this one before but I don't see an open issue for it.

In this commit, we use atomic variables in the unit test mock interfaces to store state about requested failues. We also link requested confirmation failures to a specific conf request. Only a single failure event can be requested at a time, and the mock interfaces reset after that event. This should reduce state being unintentionally shared between test cases.

gijswijs

Those racy errors sure are finicky. I like the smart solution.
I can only reproduce the timeout, but that seems unrelated. Will make an issue for the other error.

dstadulis requested review from gijswijs and guggero July 18, 2024 20:21

guggero approved these changes Jul 19, 2024

View reviewed changes

tapgarden/mock.go Outdated Show resolved Hide resolved

tapgarden/mock.go Outdated Show resolved Hide resolved

dstadulis added this to the v0.4.1 milestone Jul 22, 2024

dstadulis assigned jharveyb Jul 22, 2024

dstadulis added the unit-test label Jul 22, 2024

jharveyb force-pushed the minter_unit_race_fixes branch from d36f4ba to f16c98f Compare July 22, 2024 22:45

gijswijs approved these changes Jul 23, 2024

View reviewed changes

guggero merged commit 109d86f into main Jul 23, 2024
16 checks passed

guggero deleted the minter_unit_race_fixes branch July 23, 2024 14:00

jharveyb mentioned this pull request Aug 6, 2024

testing: mint unit tests consistently fail #974

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tapgarden: improve fault injection in unit tests #1031

tapgarden: improve fault injection in unit tests #1031

jharveyb commented Jul 18, 2024

guggero left a comment

gijswijs commented Jul 19, 2024

gijswijs commented Jul 19, 2024

jharveyb commented Jul 22, 2024

jharveyb commented Jul 22, 2024

gijswijs left a comment

tapgarden: improve fault injection in unit tests #1031

tapgarden: improve fault injection in unit tests #1031

Conversation

jharveyb commented Jul 18, 2024

guggero left a comment

Choose a reason for hiding this comment

gijswijs commented Jul 19, 2024

gijswijs commented Jul 19, 2024

jharveyb commented Jul 22, 2024

jharveyb commented Jul 22, 2024

gijswijs left a comment

Choose a reason for hiding this comment