Skip to content

Releases: awslabs/sockeye

3.1.4

10 Mar 09:14
edac700
Compare
Choose a tag to compare

[3.1.4]

Added

  • Added support for the use of adding target prefix and target prefix factors to the input in JSON format during inference.

3.1.3

28 Feb 09:43
ea08143
Compare
Choose a tag to compare

[3.1.3]

Added

  • Added support for the use of adding source prefixes to the input in JSON format during inference.

[3.1.2]

Changed

  • Optimized creation of source length mask by using expand instead of repeat_interleave.

[3.1.1]

Changed

  • Updated torch dependency to 1.10.x (torch>=1.10.0,<1.11.0)

3.1.0

11 Feb 09:25
cc7922e
Compare
Choose a tag to compare

[3.1.0]

Sockeye is now exclusively based on Pytorch.

Changed

  • Renamed x_pt modules to x. Updated entry points in setup.py.

Removed

  • Removed MXNet from the codebase
  • Removed device locking / GPU acquisition logic. Removed dependency on portalocker.
  • Removed arguments --softmax-temperature, --weight-init-*, --mc-dropout, --horovod, --device-ids
  • Removed all MXNet-related tests

3.0.15

09 Feb 19:13
9e11f7b
Compare
Choose a tag to compare

[3.0.15]

Fixed

  • Fixed GPU-based scoring by copying to cpu tensor first before converting to numpy.

[3.0.14]

Added

  • Added support for Translation Error Rate (TER) metric as implemented in sacrebleu==1.4.14.
    Checkpoint decoder metrics will now include TER scores and early stopping can be determined
    via TER improvements (--optimized-metric ter)

3.0.13

03 Feb 12:08
cc96656
Compare
Choose a tag to compare

[3.0.13]

Changed

  • use expand instead of repeat for attention masks to not allocate additional memory
  • avoid repeated transpose for initializing cached encoder-attention states in the decoder.

[3.0.12]

Removed

  • Removed unused code for Weight Normalization. Minor code cleanups.

[3.0.11]

Fixed

  • Fixed training with a single, fixed learning rate instead of a rate scheduler (--learning-rate-scheduler none --initial-learning-rate ...).

3.0.10

19 Jan 07:19
bd1c091
Compare
Choose a tag to compare

[3.0.10]

Changed

  • End-to-end trace decode_step of the Sockeye model. Creates less overhead during decoding and a small speedup.

[3.0.9]

Fixed

  • Fixed not calling the traced target embedding module during inference.

[3.0.8]

Changed

  • Add support for JIT tracing source/target embeddings and JIT scripting the output layer during inference.

3.0.7

20 Dec 09:59
6905c78
Compare
Choose a tag to compare

[3.0.7]

Changed

  • Improve training speed by usingtorch.nn.functional.multi_head_attention_forward for self- and encoder-attention
    during training. Requires reorganization of the parameter layout of the key-value input projections,
    as the current Sockeye attention interleaves for faster inference.
    Attention masks (both for source masking and autoregressive masks need some shape adjustments as requirements
    for the fused MHA op differ slightly).
    • Non-interleaved format for joint key-value input projection parameters:
      in_features=hidden, out_features=2*hidden -> Shape: (2*hidden, hidden)
    • Interleaved format for joint-key-value input projection stores key and value parameters, grouped by heads:
      Shape: ((num_heads * 2 * hidden_per_head), hidden)
    • Models save and load key-value projection parameters in interleaved format.
    • When model.training == True key-value projection parameters are put into
      non-interleaved format for torch.nn.functional.multi_head_attention_forward
    • When model.training == False, i.e. model.eval() is called, key-value projection
      parameters are again converted into interleaved format in place.

[3.0.6]

Fixed

  • Fixed checkpoint decoder issue that prevented using bleu as --optimized-metric for distributed training (#995).

[3.0.5]

Fixed

  • Fixed data download in multilingual tutorial.

3.0.4

13 Dec 17:39
8e5033b
Compare
Choose a tag to compare

[3.0.4]

  • Make sure data permutation indices are in int64 format (doesn't seem to be the case by default on all platforms).

[3.0.3]

Fixed

  • Fixed ensemble decoding for models without target factors.

[3.0.2]

Changed

  • sockeye-translate: Beam search now computes and returns secondary target factor scores. Secondary target factors
    do not participate in beam search, but are greedily chosen at every time step. Accumulated scores for secondary factors
    are not normalized by length. Factor scores are included in JSON output (--output-type json).
  • sockeye-score now returns tab-separated scores for each target factor. Users can decide how to combine factor scores
    depending on the downstream application. Score for the first, primary factor (i.e. output words) are normalized,
    other factors are not.

[3.0.1]

Fixed

  • Parameter averaging (sockeye-average) now always uses the CPU, which enables averaging parameters from GPU-trained models on CPU-only hosts.

3.0.0

30 Nov 09:48
c44f126
Compare
Choose a tag to compare

[3.0.0] Sockeye 3: Fast Neural Machine Translation with PyTorch

Sockeye is now based on PyTorch.
We maintain backwards compatibility with MXNet models in version 2.3.x until 3.1.0.
If MXNet 2.x is installed, Sockeye can run both with PyTorch or MXNet but MXNet is no longer strictly required.

Added

  • Added model converter CLI sockeye.mx_to_pt that converts MXNet models to PyTorch models.
  • Added --apex-amp training argument that runs entire model in FP16 mode, replaces --dtype float16 (requires Apex).
  • Training automatically uses Apex fused optimizers if available (requires Apex).
  • Added training argument --label-smoothing-impl to choose label smoothing implementation (default of mxnet uses the same logic as MXNet Sockeye 2).

Changed

  • CLI names point to the PyTorch code base (e.g. sockeye-train etc.).
  • MXNet-based CLIs are now accessible via sockeye-<name>-mx.
  • MXNet code requires MXNet >= 2.0 since we adopted the new numpy interface.
  • sockeye-train now uses PyTorch's distributed data-parallel mode for multi-process (multi-GPU) training. Launch with: torchrun --no_python --nproc_per_node N sockeye-train --dist ...
  • Updated the quickstart tutorial to cover multi-device training with PyTorch Sockeye.
  • Changed --device-ids argument (plural) to --device-id (singular). For multi-GPU training, see distributed mode noted above.
  • Updated default value: --pad-vocab-to-multiple-of 8
  • Removed --horovod argument used with horovodrun (use --dist with torchrun).
  • Removed --optimizer-params argument (use --optimizer-betas, --optimizer-eps).
  • Removed --no-hybridization argument (use PYTORCH_JIT=0, see Disable JIT for Debugging).
  • Removed --omp-num-threads argument (use --env=OMP_NUM_THREADS=N).

Removed

  • Removed support for constrained decoding (both positive and negative lexical constraints)
  • Removed support for beam histories
  • Removed --amp-scale-interval argument.
  • Removed --kvstore argument.
  • Removed arguments: --weight-init, --weight-init-scale --weight-init-xavier-factor-type, --weight-init-xavier-rand-type
  • Removed --decode-and-evaluate-device-id argument.
  • Removed arguments: --monitor-pattern', --monitor-stat-func
  • Removed CUDA-specific requirements files in requirements/

2.3.24

05 Nov 09:28
35dd717
Compare
Choose a tag to compare

[2.3.24]

Added

  • Use of the safe yaml loader for the model configuration files.

[2.3.23]

Changed

  • Do not sort BIAS_STATE in beam search. It is constant across decoder steps.