Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN Performance] Reuse ledger state for about -200GB peak RAM, -160GB disk i/o, and about -32 minutes duration #2792

Merged
merged 38 commits into from
Aug 2, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
07aa295
Reuse ledger state in checkpoints (save 152GB RAM)
fxamacker Jul 13, 2022
6944327
Refactor Ledger/DiskWAL/Compactor shutdowns
fxamacker Jul 19, 2022
64d4c03
Handle more error conditions in compactor
fxamacker Jul 20, 2022
3b35eb9
Check closed channel in Compactor
fxamacker Jul 20, 2022
6d60646
Add debug logging in test
fxamacker Jul 20, 2022
5a0044f
Add debug logging in test
fxamacker Jul 20, 2022
bf38e53
Avoid observer.OnNext() when Compactor is closing
fxamacker Jul 21, 2022
710f1c6
Handle ledger updates in parallel with Compactor
fxamacker Jul 21, 2022
8e036c8
Make Compactor test less flaky
fxamacker Jul 21, 2022
c053aaf
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Jul 21, 2022
65f2062
Refactor trie update processing in Compactor
fxamacker Jul 21, 2022
b8797cc
Refactor Compactor shutdown
fxamacker Jul 22, 2022
6dddcf7
Use CheckpointQueue to store tries for checkpointing
fxamacker Jul 24, 2022
ccf4571
Fix lint error
fxamacker Jul 24, 2022
cca4a71
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Jul 24, 2022
91320b4
Remove anon goroutine updating WAL in Ledger.Set
fxamacker Jul 25, 2022
2127e44
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Jul 25, 2022
9b9d6f3
Add comments
fxamacker Jul 26, 2022
f048d62
Send WAL update result outside of defer
fxamacker Jul 26, 2022
c5455bd
Refactor to use constants for default values
fxamacker Jul 27, 2022
240520d
Add comment
fxamacker Jul 27, 2022
853ff05
Rename CheckpointQueue to TrieQueue
fxamacker Jul 27, 2022
46576c5
Add comment for DiskWAL.RecordUpdate
fxamacker Jul 27, 2022
31ff7e4
Add comment for WAL shutdown in Compactor
fxamacker Jul 27, 2022
7e29a67
Add more logging in Compactor
fxamacker Jul 27, 2022
b657ca3
Add comment for Compactor and Ledger
fxamacker Jul 27, 2022
4d81b17
Fix linter error
fxamacker Jul 27, 2022
6af9f11
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Jul 28, 2022
8afb36b
Refactor compactor tests
fxamacker Jul 29, 2022
748f56e
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Jul 30, 2022
bd62030
Fix lint error
fxamacker Jul 31, 2022
9f2ec46
Add code comments
fxamacker Jul 31, 2022
22eeb3e
Only retry checkpointing on appropriate error
fxamacker Jul 31, 2022
f774617
Remove unnecessary branch in TrieQueue
fxamacker Jul 31, 2022
ec132df
[Ledger] Replace LRU cache with a FIFO queue (circular buffer) (#2893)
ramtinms Aug 2, 2022
e5cce3e
Merge branch 'master' into fxamacker/reuse-mtrie-state-for-checkpoint…
fxamacker Aug 2, 2022
711deaa
Fix lint errors
fxamacker Aug 2, 2022
c700389
Fix PurgeCacheExcept test
fxamacker Aug 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions cmd/bootstrap/run/execution_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ func GenerateExecutionState(
if err != nil {
return flow.DummyStateCommitment, err
}
defer func() {
<-diskWal.Done()
}()

ledgerStorage, err := ledger.NewLedger(diskWal, 100, metricsCollector, zerolog.Nop(), ledger.DefaultPathFinderVersion)
if err != nil {
return flow.DummyStateCommitment, err
}
defer func() {
<-ledgerStorage.Done()
}()

return bootstrap.NewBootstrapper(
zerolog.Nop()).BootstrapLedger(
Expand Down
30 changes: 14 additions & 16 deletions cmd/execution_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -343,12 +343,6 @@ func (e *ExecutionNodeBuilder) LoadComponentsAndModules() {
}
return nil
}).
Component("Write-Ahead Log", func(node *NodeConfig) (module.ReadyDoneAware, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why moving this down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiskWAL needs to be a dependent component of Ledger because I found component shutdown is parallel (although it used to be in sequence).

var err error
diskWAL, err = wal.NewDiskWAL(node.Logger.With().Str("subcomponent", "wal").Logger(),
node.MetricsRegisterer, collector, e.exeConf.triedir, int(e.exeConf.mTrieCacheSize), pathfinder.PathByteSize, wal.SegmentSize)
return diskWAL, err
}).
Component("execution state ledger", func(node *NodeConfig) (module.ReadyDoneAware, error) {

// check if the execution database already exists
Expand Down Expand Up @@ -384,23 +378,27 @@ func (e *ExecutionNodeBuilder) LoadComponentsAndModules() {
}
}

ledgerStorage, err = ledger.NewLedger(diskWAL, int(e.exeConf.mTrieCacheSize), collector, node.Logger.With().Str("subcomponent",
// Ledger is responsible for starting and shutdowning DiskWAL component.
// This ensures that all WAL updates are completed before closing opened WAL segment.
diskWAL, err = wal.NewDiskWAL(node.Logger.With().Str("subcomponent", "wal").Logger(),
node.MetricsRegisterer, collector, e.exeConf.triedir, int(e.exeConf.mTrieCacheSize), pathfinder.PathByteSize, wal.SegmentSize)
if err != nil {
return nil, fmt.Errorf("failed to initialize wal: %w", err)
}

ledgerStorage, err = ledger.NewSyncLedger(diskWAL, int(e.exeConf.mTrieCacheSize), collector, node.Logger.With().Str("subcomponent",
"ledger").Logger(), ledger.DefaultPathFinderVersion)
return ledgerStorage, err
}).
Component("execution state ledger WAL compactor", func(node *NodeConfig) (module.ReadyDoneAware, error) {

checkpointer, err := ledgerStorage.Checkpointer()
if err != nil {
return nil, fmt.Errorf("cannot create checkpointer: %w", err)
}
compactor := wal.NewCompactor(checkpointer,
10*time.Second,
return ledger.NewCompactor(
ledgerStorage,
diskWAL,
e.exeConf.checkpointDistance,
e.exeConf.checkpointsToKeep,
node.Logger.With().Str("subcomponent", "checkpointer").Logger())

return compactor, nil
node.Logger.With().Str("subcomponent", "checkpointer").Logger(),
)
}).
Component("execution data service", func(node *NodeConfig) (module.ReadyDoneAware, error) {
err := os.MkdirAll(e.exeConf.executionDataDir, 0700)
Expand Down
6 changes: 3 additions & 3 deletions cmd/util/cmd/exec-data-json-export/ledger_exporter.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ func ExportLedger(ledgerPath string, targetstate string, outputPath string) erro
if err != nil {
return fmt.Errorf("cannot create WAL: %w", err)
}
defer func() {
<-diskWal.Done()
}()
led, err := complete.NewLedger(diskWal, complete.DefaultCacheSize, &metrics.NoopCollector{}, log.Logger, 0)
if err != nil {
return fmt.Errorf("cannot create ledger from write-a-head logs and checkpoints: %w", err)
}
defer func() {
<-led.Done()
}()
stateBytes, err := hex.DecodeString(targetstate)
if err != nil {
return fmt.Errorf("failed to decode hex code of state: %w", err)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,6 @@ func extractExecutionState(
if err != nil {
return fmt.Errorf("cannot create disk WAL: %w", err)
}
defer func() {
<-diskWal.Done()
}()

led, err := complete.NewLedger(
diskWal,
Expand All @@ -56,6 +53,9 @@ func extractExecutionState(
if err != nil {
return fmt.Errorf("cannot create ledger from write-a-head logs and checkpoints: %w", err)
}
defer func() {
<-led.Done()
}()

var migrations []ledger.Migration
var preCheckpointReporters, postCheckpointReporters []ledger.Reporter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,6 @@ func TestExtractExecutionState(t *testing.T) {
blocksInOrder[i] = blockID
}

<-diskWal.Done()
<-f.Done()
err = db.Close()
require.NoError(t, err)
Expand Down Expand Up @@ -181,7 +180,6 @@ func TestExtractExecutionState(t *testing.T) {
require.Error(t, err)
}

<-diskWal.Done()
<-storage.Done()
})
}
Expand Down
1 change: 0 additions & 1 deletion ledger/complete/checkpoint_benchmark_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,6 @@ func benchmarkNewCheckpointRandomData(b *testing.B, segmentCount int) {
b.Fatal(err)
}

<-wal1.Done()
<-led.Done()

wal2, err := wal.NewDiskWAL(
Expand Down
Loading