Skip to content

Releases: awslabs/sockeye

1.10.5

08 Nov 10:45
88940a7
Compare
Choose a tag to compare

[1.10.5]

Fixed

  • Fixed yet another bug with the data iterator.

[1.10.4]

Fixed

  • Fixed a bug with the revised data iterator not correctly appending EOS symbols for variable-length batches.
    This reverts part of the commit added in 1.10.1 but is now correct again.

1.10.3

07 Nov 08:18
90a7e47
Compare
Choose a tag to compare

[1.10.3]

Changed

  • Fixed a bug with max_observed_{source,target}_len being computed on the complete data set, not only on the
    sentences actually added to the buckets based on --max_seq_len.

[1.10.2]

Added

  • --max-num-epochs flag to train for a maximum number of passes through the training data.

Update to MXNet 0.12.0

02 Nov 10:40
375e9b4
Compare
Choose a tag to compare

[1.10.1]

Changed

  • Reduced memory footprint when creating data iterators: integer sequences
    are streamed from disk when being assigned to buckets.

[1.10.0]

Changed

  • Updated MXNet dependency to 0.12 (w/ MKL support by default).
  • Changed --smoothed-cross-entropy-alpha to --label-smoothing.
    Label smoothing should now require significantly less memory due to its addition to MXNet's SoftmaxOutput operator.
  • --weight-normalization now applies not only to convolutional weight matrices, but to output layers of all decoders.
    It is also independent of weight tying.
  • Transformers now use --embed-dropout. Before they were using --transformer-dropout-prepost for this.
  • Transformers now scale their embedding vectors before adding fixed positional embeddings.
    This turns out to be crucial for effective learning.
  • .param files now use 5 digit identifiers to reduce risk of overflowing with many checkpoints.

Added

  • Added CUDA 9.0 requirements file.
  • --loss-normalization-type. Added a new flag to control loss normalization. New default is to normalize
    by the number of valid, non-PAD tokens instead of the batch size.
  • --weight-init-xavier-factor-type. Added new flag to control Xavier factor type when --weight-init=xavier.
  • --embed-weight-init. Added new flag for initialization of embeddings matrices.

Removed

  • --smoothed-cross-entropy-alpha argument. See above.
  • --normalize-loss argument. See above.

[1.9.0]

Added

  • Batch decoding. New options for the translate CLI: --batch-size and --chunk-size. Translator.translate()
    now accepts and returns lists of inputs and outputs.

[1.8.4]

Added

  • Exposing the MXNet KVStore through the --kvstore argument, potentially enabling distributed training.

[1.8.3]

Added

  • Optional smart rollback of parameters and optimizer states after updating the learning rate
    if not improved for x checkpoints. New flags: --learning-rate-decay-param-reset,
    --learning-rate-decay-optimizer-states-reset

[1.8.2]

Fixed

  • The RNN variational dropout mask is now independent of the input
    (previously any zero initial state led to the first state being canceled).
  • Correctly pass self.dropout_inputs float to mx.sym.Dropout in VariationalDropoutCell.

[1.8.1]

Changed

  • Instead of truncating sentences exceeding the maximum input length they are now translated in chunks.

Conv2seq models

10 Oct 08:11
Compare
Choose a tag to compare

Added

  • Convolutional decoder.
  • Weight normalization (for CNN only so far).
  • Learned positional embeddings for the transformer.

Changed

  • --attention-* CLI params renamed to --rnn-attention-*.
  • --transformer-no-positional-encodings generalized to --transformer-positional-embedding-type.

Updated Word batching

10 Oct 08:11
Compare
Choose a tag to compare
  • Word batching update: guarantee default bucket has largest batch size.

  • Comments/logic for clarity.

  • Address PR comments.

  • Memory usage note.
  • NamedTuple for bucket batch sizes.

Transformer models

10 Oct 08:13
Compare
Choose a tag to compare
  • Added transformer models (Vaswasni et al, 2017) to Sockeye