Skip to content
/ FTSEmu Public

Emulating Distributed Fault Tolerant Termination Detection with Safra's Algorithm

License

Notifications You must be signed in to change notification settings

gkrls/FTSEmu

Repository files navigation

FTSEmu

A multi-threaded distributed system emulator. It has been developed as a platform for testing and analyzing a novel Fault-Tolerant, Distributed Termination Detection algorithm based on the algorithm by Shmuel Safra.

An implementation of the fault-tolerant algorithm has been applied to a fault-tolerant version of the Chandy-Misra routing algorithm as well as a fault-tolerant version of the Afek-Kutten-Yung self-stabilizing spanning tree algorithm. The implementation can be found here.

More:
[1] Georgios Karlos, Wan Fokkink, Per Fuchs - Fault-Tolerant Termination Detection with Safra's Algorithm
[2] Georgios Karlos - Emulation of a Fault-Tolerant Variant of Safra's Termination Detection Algoritm
[3] Edsger W. Dijkstra - Shmuel Safra's Version of Termination Detection

Usage

  • Build
$ ant jar
  • Run
$ java -jar TDS-0.1.jar [OPTION]...  

Options

Option Description Possible Parameters Default
-h Print help - off
-v Verbose output - off
-ver Algorithm version 1 (OFSS), 2 (IFSS), 3 (FTS), 0 (All) 0
-n Number of nodes [2, 1024] 4
-c Number of crashing nodes -c : random
-c low high: crash nodes in 'low' to 'high' interval
0
-l Activity level 1, 2 1
-w Max wait time for a run (ms) >0 100000
-csv Write performance metrics to a csv file - off
-f Clean (flush) the csv (if one exists with a name corresponding to this run's params) - off
-dist Choose a probability distribution by which random activity is determined -(1) : Uniform
-(2) : Gaussian
-strategy Choose a strategy by which activity is happening (random under chose distribution), when a node becomes active -(1) : Compute-Send - Perform some computation and then send some messages (uniform 0-3, gaussian mean=1 sd=1)
-(2) : N-Activities - Choose an N under given distribution. For each of the n activities randomly choose if computation of message
1
-batype Basic Algorithm type -(1) : Centralized (Node 0 is initially active)
-(2) : Decentralized-Even (Nodes with even id's are initially active)
-(3) : Decentralized-Random (A random number of nodes, uniformly chosen, is initially active)
2
-ci Interval between 2 consecutive node crashes -(1) : Uniform in [200, 3000] ms
-(2) : Gaussian with mean 1000ms and sd 200ms
1
-anl Average network latency Value in [20, 100] (Message delays are random with Gaussian distribution with mean -avl and standard deviation of 10 ms 60

Verbose Output

This is a typical output when running with option -v and default -ver

simulator screenshot

Issues

[1] There is currently a bug introduced after the migration. When all versions run at the same time, occasionally, one will hang with no activity taking place for that version.

About

Emulating Distributed Fault Tolerant Termination Detection with Safra's Algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published