Skip to content

Releases: awslabs/sockeye

2.3.22

30 Sep 09:41
a0bc3f0
Compare
Choose a tag to compare

[2.3.22]

Fixed

  • The previous commit introduced a regression for vocab creation. The results was that the vocabulary was created on the input characters rather than on tokens.

[2.3.21]

Added

  • Extended parallelization of data preparation to vocabulary and statistics creation while minimizing the overhead of sharding.

[2.3.20]

Added

  • Added debug logging for restrict_lexicon lookups

[2.3.19]

Changed

  • When training only the decoder (--fixed-param-strategy all_except_decoder), disable autograd for the encoder and embeddings to save memory.

[2.3.18]

Changed

2.3.17

17 Jun 10:50
ef908e3
Compare
Choose a tag to compare

[2.3.17]

Added

  • Added an alternative, faster implementation of greedy search. The '--greedy' flag to sockeye.translate will enable it. This implementation does not support hypothesis scores, batch decoding, or lexical constraints."

[2.3.16]

Added

[2.3.15]

Changed

  • Optimization: Decoder class is now a complete HybridBlock (no forward method).

2.3.14

07 Apr 11:50
587eb7f
Compare
Choose a tag to compare

[2.3.14]

Changed

  • Updated to MXNet 1.8.0
  • Removed dependency support for Cuda 9.2 (no longer supported by MXNet 1.8).
  • Added dependency support for Cuda 11.0 and 11.2.
  • Updated Python requirement to 3.7 and later. (Removed backporting dataclasses requirement)

[2.3.13]

Added

  • Target factors are now also collected for nbest translations (and stored in the JSON output handler).

[2.3.12]

Added

  • Added --config option to prepare_data CLI to allow setting commandline flags via a yaml config.
  • Flags for the prepare_data CLI are now stored in the output folder under args.yaml
    (equivalent to the behavior of sockeye_train)

[2.3.11]

Added

  • Added option prevent_unk to avoid generating <unk> token in beam search.

2.3.10

08 Feb 10:00
c3870e3
Compare
Choose a tag to compare

[2.3.10]

Changed

  • Make sure that the top N best params files retained, even if N > --keep-last-params. This ensures that model
    averaging will not be crippled when keeping only a few params files during training. This can result in a
    significant savings of disk space during training.

[2.3.9]

Added

2.3.8

08 Jan 08:10
b6d8d35
Compare
Choose a tag to compare

[2.3.8]

Fixed

  • Fix problem identified in issue #925 that caused learning rate
    warmup to fail in some instances when doing continued training

[2.3.7]

Changed

  • Use dataclass module to simplify Config classes. No functional change.

[2.3.6]

Fixed

  • Fixes the problem identified in issue #890, where the lr_scheduler
    does not behave as expected when continuing training. The problem is
    that the lr_scheduler is kept as part of the optimizer, but the
    optimizer is not saved when saving state. Therefore, every time
    training is restarted, a new lr_scheduler is created with initial
    parameter settings. Fix by saving and restoring the lr_scheduling
    separately.

[2.3.5]

Fixed

  • Fixed issue with LearningRateSchedulerPlateauReduce.repr printing
    out num_not_improved instead of reduce_num_not_improved.

[2.3.4]

Fixed

  • Fixed issue with dtype mismatch in beam search when translating with --dtype float16.

[2.3.3]

Changed

  • Upgraded SacreBLEU dependency of Sockeye to a newer version (1.4.14).

2.3.2

18 Nov 13:41
26c02b1
Compare
Choose a tag to compare

[2.3.2]

Fixed

  • Fixed edge case that unintentionally skips softmax for sampling if beam size is 1.

[2.3.1]

Fixed

  • Optimizing for BLEU/CHRF with horovod required the secondary workers to also create checkpoint decoders.

[2.3.0]

Added

  • Added support for target factors.
    If provided with additional target-side tokens/features (token-parallel to the regular target-side) at training time,
    the model can now learn to predict these in a multi-task setting. You can provide target factor data similar to source
    factors: --target-factors <factor_file1> [<factor_fileN>]. During training, Sockeye optimizes one loss per factor
    in a multi-task setting. The weight of the losses can be controlled by --target-factors-weight.
    At inference, target factors are decoded greedily, they do not participate in beam search.
    The predicted factor at each time step is the argmax over its separate output
    layer distribution. To receive the target factor predictions at inference time, use
    --output-type translation_with_factors.

Changed

  • load_model(s) now returns a list of target vocabs.
  • Default source factor combination changed to sum (was concat before).
  • SockeyeModel class has three new properties: num_target_factors, target_factor_configs,
    and factor_output_layers.

2.2.8

05 Nov 14:06
cbe9bff
Compare
Choose a tag to compare

[2.2.8]

Changed

  • Make source/target data parameters required for the scoring CLI to avoid cryptic error messages.

[2.2.7]

Added

  • Added an argument to specify the log level of secondary workers. Defaults to ERROR to hide any logs except for exceptions.

[2.2.6]

Fixed

  • Avoid a crash due to an edge case when no model improvement has been observed by the time the learning rate gets reduced for the first time.

[2.2.5]

Fixed

  • Enforce sentence batching for sockeye score tool, set default batch size to 56

[2.2.4]

Changed

  • Use softmax with length in DotAttentionCell.
  • Use contrib.arange_like in AutoRegressiveBias block to reduce number of ops.

[2.2.3]

Added

  • Log the absolute number of <unk> tokens in source and target data

[2.2.2]

Fixed

  • Fix: Guard against null division for small batch sizes.

[2.2.1]

Fixed

  • Fixes a corner case bug by which the beam decoder can wrongly return a best hypothesis with -infinite score.

2.2.0

04 Oct 17:22
9014405
Compare
Choose a tag to compare

[2.2.0]

Changed

  • Replaced multi-head attention with interleaved_matmul_encdec operators, which removes previously needed transposes and improves performance.

  • Beam search states and model layers now assume time-major format.

[2.1.26]

Fixed

  • Fixes a backwards incompatibility introduced in 2.1.17, which would prevent models trained with prior versions to be used for inference.

[2.1.25]

Changed

  • Reverting PR #772 as it causes issues with amp.

[2.1.24]

Changed

  • Make sure to write a final checkpoint when stopping with --max-updates, --max-samples or --max-num-epochs.

[2.1.23]

Changed

  • Updated to MXNet 1.7.0.
  • Re-introduced use of softmax with length parameter in DotAttentionCell (see PR #772).

[2.1.22]

Added

  • Re-introduced --softmax-temperature flag for sockeye.score and sockeye.translate.

2.1.21

27 Aug 13:31
f68a217
Compare
Choose a tag to compare

[2.1.21]

Added

  • Added an optional ability to cache encoder outputs of model.

[2.1.20]

Fixed

  • Fixed a bug where the training state object was saved to disk before training metrics were added to it, leading to an inconsistency between the training state object and the metrics file (see #859).

[2.1.19]

Fixed

  • When loading a shard in Horovod mode, there is now a check that each non-empty bucket contains enough sentences to cover each worker's slice. If not, the bucket's sentences are replicated to guarantee coverage.

[2.1.18]

Fixed

  • Fixed a bug where sampling translation fails because an array is created in the wrong context.

2.1.17

20 Aug 18:25
92a020a
Compare
Choose a tag to compare

[2.1.17]

Added

  • Added layers.SSRU, which implements a Simpler Simple Recurrent Unit as described in
    Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.

  • Added ssru_transformer option to --decoder, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.

Changed

  • Reduced the number of arguments for MultiHeadSelfAttention.hybrid_forward().
    previous_keys and previous_values should now be input together as previous_states, a list containing two symbols.