Skip to content

Releases: awslabs/sockeye

1.18.23

13 Jun 07:34
bea8e6a
Compare
Choose a tag to compare

Fixed

  • Correctly create the convolutional embedding layers when the encoder is set to transformer-with-conv-embed. Previously no convolutional layers were added so that a standard Transformer model was trained instead.

1.18.22

12 Jun 09:49
83468d2
Compare
Choose a tag to compare

Fixed

  • Make sure the default bucket is large enough with word based batching when the source is longer than the target (Previously
    there was an edge case where the memory usage was sub-optimal with word based batching and longer source than target sentences).

1.18.21

04 Jun 12:06
f56e783
Compare
Choose a tag to compare

[1.18.21]

Fixed

  • Constrained decoding was missed a crucial cast
  • Fixed test cases that should have caught this

1.18.20

27 May 19:01
163dec8
Compare
Choose a tag to compare

[1.18.20]

Changed

  • Transformer parametrization flags (model size, # of attention heads, feed-forward layer size) can now optionally
    defined separately for encoder & decoder. For example, to use a different transformer model size for the encoder,
    pass --transformer-model-size 1024:512.

[1.18.19]

Added

  • LHUC is now supported in transformer models

[1.18.18]

Added

  • [Experimental] Introducing the image captioning module. Type of models supported: ConvNet encoder - Sockeye NMT decoders. This includes also a feature extraction script,
    an image-text iterator that loads features, training and inference pipelines and a visualization script that loads images and captions.
    See this tutorial for its usage. This module is experimental therefore its maintenance is not fully guaranteed.

1.18.17

24 May 12:08
8835331
Compare
Choose a tag to compare

[1.18.17]

Changed

  • Updated to MXNet 1.2
  • Use of the new LayerNormalization operator to save GPU memory.

[1.18.16]

Fixed

  • Removed summation of gradient arrays when logging gradients.
    This clogged the memory on the primary GPU device over time when many checkpoints were done.
    Gradient histograms are now logged to Tensorboard separated by device.

1.18.15

23 May 08:55
Compare
Choose a tag to compare

[1.18.15]

Added

  • Added decoding with target-side lexical constraints (documentation in tutorials/constraints).

[1.18.14]

Added

  • Introduced Sockeye Autopilot for single-command end-to-end system building.
    See the Autopilot documentation and run with: sockeye-autopilot.
    Autopilot is a contrib module with its own tests that are run periodically.
    It is not included in the comprehensive tests run for every commit.

1.18.13

19 May 16:35
3f8cb0b
Compare
Choose a tag to compare

[1.18.13]

Fixed

  • Fixed two bugs with training resumption:
    1. removed overly strict assertion in the data iterator for model states before the first checkpoint.
    2. removed deletion of Tensorboard log directory.

Added

  • Added support for config files. Command line parameters have precedence over the values read from the config file.
    Minimal working example:
    python -m sockeye.train --config config.yaml with contents of config.yaml as follows:
    source: source.txt
    target: target.txt
    output: out
    validation_source: valid.source.txt
    validation_target: valid.target.txt

Changed

The full set of arguments is serialized to out/args.yaml at the beginning of training (before json was used).

[1.18.12]

Changed

  • All source side sequences now get appended an additional end-of-sentence (EOS) symbol. This change is backwards
    compatible meaning that inference with older models will still work without the EOS symbol.

[1.18.11]

Changed

  • Default training parameters have been changed to reflect the setup used in our arXiv paper. Specifically, the default
    is now to train a 6 layer Transformer model with word based batching. The only difference to the paper is that weight
    tying is still turned off by default, as there may be use cases in which tying the source and target vocabularies is
    not appropriate. Turn it on using --weight-tying --weight-tying-type=src_trg_softmax. Additionally, BLEU scores from
    a checkpoint decoder are now monitored by default.

1.18.10

08 May 15:04
467c23f
Compare
Choose a tag to compare

[1.18.10]

Fixed

  • Re-allow early stopping w.r.t BLEU

1.18.9

07 May 08:48
3c1a8c5
Compare
Choose a tag to compare

[1.18.9]

Fixed

  • Fixed a problem with lhuc boolean flags passed as None.

Added

  • Reorganized beam search. Normalization is applied only to completed hypotheses, and pruning of
    hypotheses (logprob against highest-scoring completed hypothesis) can be specified with
    --beam-prune X
  • Enabled stopping at first completed hypothesis with --beam-search-stop first (default is 'all')

1.18.8

04 May 11:35
2725ce1
Compare
Choose a tag to compare

[1.18.8]

Removed

  • Removed tensorboard logging of embedding & output parameters at every checkpoint. This used a lot of disk space.

[1.18.7]

Added

  • Added support for LHUC in RNN models (David Vilar, "Learning Hidden Unit
    Contribution for Adapting Neural Machine Translation Models" NAACL 2018)

Fixed

  • Word based batching with very small batch sizes.