Skip to content

Negotiation algorithm

Marios Fragkoulis edited this page Dec 6, 2016 · 28 revisions

The workflow in 6 steps

  1. Each process in a dgsh graph registers its identity and I/O constraints and capabilities associated with its input and output channels (e.g. must have exactly two inputs, can supply an unlimited number of outputs). In this way, each node's edges, that is the topology of the graph, for both input and output channels are recorded.
  2. The connection algorithm tries to satisfy the constraints per tool per I/O channel by matching the requirements of endpoint pairs.
  3. The algorithm tries to optimise the solution by leveraging any flexible constraints of processes in the graph.
  4. The solution is communicated to all processes.
  5. Processes with outgoing connections create communication endpoints as designated by the solution and dispatch the endpoints for receiving input from their output channel.
  6. Processes receive the appropriate communication endpoints and begin execution.

DGSH negotiation specifics

Dgsh-aware programs within dgsh blocks ({{ ... }}) or connected by a pipe to or from them:

  • are connected with a socketpair and negotiate through it by passing packets from one end of the dgsh-aware pipleline to the other.
  • test two environment variables, DGSH_IN and DGSH_OUT, to find out on which end they can negotiate.
  • One program with DGSH_OUT && !DGSH_IN starts the negotiation and continues by reading, writing on its output fd.
  • Programs with DGSH_IN && !DGSH_OUT read and write on their input fd.
  • All other programs start by reading from the input side and writing to the output side, and then alternate I/O directions.
  • Concentrators (see below) are used to allow each process to have at most one negotiation input and one output. In addition, if more than one programs with DGSH_OUT && !DGSH_IN are present, a concentrator starts the negotiation.
  • For example, the negotiation for a graph containing the above processes would happen in the order: X0, a, X1, b, c, d, X2, e, f, e, X2, g, X2, h, X3, i, X3, j, X3, h, X2, d, c, b, X1, k, l, X0. Note the following.
    • X_N_ are concentrators that are inserted by the shell.
  • The communicated message block contains
    • the process id of the program that created it,
    • the graph of connected programs,
    • the number of I/O channels each one provides and requires,
    • the run flag,
    • the error flag,
    • (The number of I/O channels can be -1 to denote flexibility.)
  • Programs that fail their argument processing set the message block error flag.
  • The starting program (including a concentrator) that receives the message block back it:
    • Runs the algorithm to allocate I/O ports along the graph
    • Sets the run flag if a solution is found
    • Sets the error flag and print an error message if the negotiation requests cannot be satisfied
    • Sets the ID of the process that passed them the message block
  • Non-terminal processes that get the message block with the run flag set a second time and terminal processes that get the message block with the run flag set:
    • Pass the block (except for the process whose ID is set)
    • Create required output pipes
    • Read input fds from their input side
    • Write output fds to their output side
    • Start running
  • Non-terminal processes that get the message block with the error flag set a second time and terminal processes that get the message block with the error flag set:
    • Pass the block (except for the process whose ID is set)
    • Exit with a non-zero exit code
  • At the end of the negotiation, processes have the first input file descriptor provided to them, if any, substitute stdin and the first output file descriptor substitute stdout respectively. The dgsh negotiate API will return all file descriptors for each channel, but, as mentioned above, the first one has already been taken care of.

Within a {{ }} block executable commands and {{ }} blocks can offer multiple inputs or outputs. All other structures offer a single I/O.

When dgsh is run it passes DGSH_IN (if set) to the first process of the script, and DGSH_OUT (if set) to the last one.

A dgsh {{ }} block with more than one input and DGSH_IN not set, or more than one output and DGSH_OUT not set terminates with an error.

Concentrators

All programs, even those that support more than one input or output, negotiate through exactly one input and one output channel. At the beginning and the end of the {{ }} block multiple programs may need to communicate with a single one. In that case an input or output concentrator gets connected to the negotiation fds of the commands within the block and negotiates with the block's communicating parties. Thus the concentrators create a ring traversing all edges of the DAG.

  • Concentrators are passive components,but they can initiate a negotiation and calculate a solution when more than one programs with DGSH_IN and !DGSH_OUT are present. Otherwise, they just pass around negotiation blocks.

  • Negotiation from multiple input programs is received on fds 0, 3, 4, ... while negotiation to multiple output programs is sent on fds 1, 3, 4 , ... as indicated in the figure below.

Input concentrator Output concentrator

  • The concentrator dgsh-conc is invoked by the shell with arguments specifying whether it is concentrating input or output, and the number of input or output channels.
  • Concentrators that get the message block with the run flag set on a given port:
    • Read the block and pass it to the next port connected to them
    • Mark a port as run-ready after reading and writing to it once
    • Read input fds from run-ready input port(s)
    • When all ports are marked as run-ready, gather or scatter input fds and write them as output fds to the output port(s)
    • Exit
  • Two adjacent multipipe blocks are joined by an input concentrator communicating with an output concentrator.