Different convergence behavior of gradient-based M-steps with `seq` and `mpi` run policy #41

jdrefs · 2023-03-23T21:58:21Z

TVAE (and presumably other models using gradient optimizers) shows significantly different convergence behavior on sequential and distributed execution; for models using analytical results for the M-steps (e.g. BSC) similar effects are not observed.

For example in a bars test, TVAE and BSC showed differences between converged and ground-truth lower bounds approximately in the range [0,1] when being executed sequentially. On distributed execution, on the other hand,lower bound differences w.r.t. ground-truth remained similar for BSC while they increased to approximately [0,7] for TVAE (i.e., TVAE in this case fails to approach good optima).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different convergence behavior of gradient-based M-steps with `seq` and `mpi` run policy #41

Different convergence behavior of gradient-based M-steps with `seq` and `mpi` run policy #41

jdrefs commented Mar 23, 2023

Different convergence behavior of gradient-based M-steps with seq and mpi run policy #41

Different convergence behavior of gradient-based M-steps with seq and mpi run policy #41

Comments

jdrefs commented Mar 23, 2023

Different convergence behavior of gradient-based M-steps with `seq` and `mpi` run policy #41

Different convergence behavior of gradient-based M-steps with `seq` and `mpi` run policy #41