Skip to content

Releases: awslabs/sockeye

2.1.16

31 Jul 09:34
2cbad61
Compare
Choose a tag to compare

[2.1.16]

Fixed

  • Fixed batch sizing error introduced in version 2.1.12 (c00da52) that caused batch sizes to be multiplied by the number of devices. Batch sizing now works as documented (same as pre-2.1.12 versions).
  • Fixed max-word batching to properly size batches to a multiple of both --batch-sentences-multiple-of and the number of devices.

[2.1.15]

Added

  • Inference option --mc-dropout to use dropout during inference, leading to non-deterministic output. This option uses the same dropout parameters present in the model config file.

[2.1.14]

Added

  • Added sockeye.rerank option --output to specify output file.
  • Added sockeye.rerank option --output-reference-instead-of-blank to output reference line instead of best hypothesis when best hypothesis is blank.

2.1.13

07 Jul 14:11
292b42e
Compare
Choose a tag to compare

[2.1.13]

Added

  • Training option --quiet-secondary-workers that suppresses console output for secondary workers when training with Horovod/MPI.
  • Set version of isort to <5.0.0 in requirements.dev.txt to avoid incompatibility between newer versions of isort and pylint.

[2.1.12]

Added

  • Batch type option max-word for max number of words including padding tokens (more predictable memory usage than word).
  • Batching option --batch-sentences-multiple-of that is similar to --round-batch-sizes-to-multiple-of but always rounds down (more predictable memory usage).

Changed

  • Default bucketing settings changed to width 8, max sequence length 95 (96 including BOS/EOS tokens), and no bucket scaling.
  • Argument --no-bucket-scaling replaced with --bucket-scaling which is False by default.

[2.1.11]

Changed

  • Updated sockeye.rerank module to use "add-k" smoothing for sentence-level BLEU.

Fixed

  • Updated sockeye.rerank module to use current N-best format.

2.1.10

23 Jun 15:44
1e5e821
Compare
Choose a tag to compare

[2.1.10]

Changed

  • Changed to a cross-entropy loss implementation that avoids the use of SoftmaxOutput.

[2.1.9]

Added

  • Added training argument --ignore-extra-params to ignore extra parameters when loading models. The primary use case is continuing training with a model that has already been annotated with scaling factors (sockeye.quantize).

Fixed

  • Properly pass allow_missing flag to model.load_parameters()

[2.1.8]

Changed

  • Update to sacrebleu=1.4.10

2.1.7

03 Jun 09:40
88dc440
Compare
Choose a tag to compare

[2.1.7]

Changed

  • Optimize prepare_data by saving the shards in parallel. The prepare_data script accepts a new parameter --max-processes to control the level of parallelism with which shards are written to disk.

[2.1.6]

Changed

  • Updated Dockerfiles optimized for CPU (intgemm int8 inference, full MKL support) and GPU (distributed training with Horovod). See sockeye_contrib/docker.

Added

  • Official support for int8 quantization with intgemm:
    • This requires the "intgemm" fork of MXNet (kpuatamazon/incubator-mxnet/intgemm). This is the version of MXNet used in the Sockeye CPU docker image (see sockeye_contrib/docker).
    • Use sockeye.translate --dtype int8 to quantize a trained float32 model at runtime.
    • Use the sockeye.quantize CLI to annotate a float32 model with int8 scaling factors for fast runtime quantization.

[2.1.5]

Changed

  • Changed state caching for transformer models during beam search to cache states with attention heads already separated out. This avoids repeated transpose operations during decoding, leading to faster inference.

[2.1.4]

Added

[2.1.3]

Changed

  • Performance optimizations to beam search inference
    • Remove unneeded take ops on encoder states
    • Gathering input data before sending to GPU, rather than sending each batch element individually
    • All of beam search can be done in fp16, if specified by the model
    • Other small miscellaneous optimizations
  • Model states are now a flat list in ensemble inference, structure of states provided by state_structure()

[2.1.2]

Changed

Added

  • Added support for CUDA 10.2

Removed

  • Removed support for CUDA<9.1 / CUDNN<7.5

[2.1.1]

Added

  • Ability to set environment variables from training/translate CLIs before MXNet is imported. For example, users can
    configure MXNet as such: --env "OMP_NUM_THREADS=1;MXNET_ENGINE_TYPE=NaiveEngine"

[2.1.0]

Changed

  • Version bump, which should have been included in commit b0461b due to incompatible models.

[2.0.1]

Changed

  • Inference defaults to using the max input length observed in training (versus scaling down based on mean length ratio and standard deviations).

Added

  • Additional parameter fixing strategies:
    • all_except_feed_forward: Only train feed forward layers.
    • encoder_and_source_embeddings: Only train the decoder (decoder layers, output layer, and target embeddings).
    • encoder_half_and_source_embeddings: Train the latter half of encoder layers and the decoder.
  • Option to specify the number of CPU threads without using an environment variable (--omp-num-threads).
  • More flexibility for source factors combination

[2.0.0]

Changed

  • Update to MXNet 1.5.0
  • Moved SockeyeModel implementation and all layers to Gluon API
  • Removed support for Python 3.4.
  • Removed image captioning module
  • Removed outdated Autopilot module
  • Removed unused training options: Eve, Nadam, RMSProp, Nag, Adagrad, and Adadelta optimizers, fixed-step and fixed-rate-inv-t learning rate schedulers
  • Updated and renamed learning rate scheduler fixed-rate-inv-sqrt-t -> inv-sqrt-decay
  • Added script for plotting metrics files: sockeye_contrib/plot_metrics.py
  • Removed option --weight-tying. Weight tying is enabled by default, disable with --weight-tying-type none.

Added

  • Added distributed training support with Horovod/OpenMPI. Use horovodrun and the --horovod training flag.
  • Added Dockerfiles that build a Sockeye image with all features enabled. See sockeye_contrib/docker.
  • Added none learning rate scheduler (use a fixed rate throughout training)
  • Added linear-decay learning rate scheduler
  • Added training option --learning-rate-t-scale for time-based decay schedulers
  • Added support for MXNet's Automatic Mixed Precision. Activate with the --amp training flag. For best results, make sure as many model dimensions are possible are multiples of 8.
  • Added options for making various model dimensions multiples of a given value. For example, use --pad-vocab-to-multiple-of 8, --bucket-width 8 --no-bucket-scaling, and --round-batch-sizes-to-multiple-of 8 with AMP training.
  • Added GluonNLP's BERTAdam optimizer, an implementation of the Adam variant used by Devlin et al. (2018). Use --optimizer bertadam.
  • Added training option --checkpoint-improvement-threshold to set the amount of metric improvement required over the window of previous checkpoints to be considered actual model improvement (used with --max-num-checkpoint-not-improved).

1.18.115

03 Jun 07:41
482f9d4
Compare
Choose a tag to compare

[1.18.115]

Added

  • Added requirements for MXnet compatible with cuda 10.1.

[1.18.114]

Fixed

  • Fix bug in prepare_train_data arguments.

[1.18.113]

Fixed

  • Added logging arguments for prepare_data CLI.

[1.18.112]

Added

  • Option to suppress creation of logfiles for CLIs (--no-logfile).

[1.18.111]

Added

  • Added an optional checkpoint callback for the train function.

Changed

  • Excluded gradients from pickled fields of TrainState

[1.18.110]

Changed

  • We now guard against failures to run nvidia-smi for GPU memory monitoring.

[1.18.109]

Fixed

  • Fixed the metric names by prefixing training metrics with 'train-' and validation metrics with 'val-'. Also restricted the custom logging function to accept only a dictionary and a compulsory global_step parameter.

[1.18.108]

Changed

  • More verbose log messages about target token counts.

[1.18.107]

Changed

1.18.106

18 Aug 08:56
49e46b2
Compare
Choose a tag to compare

[1.18.106]

Added

  • Added an optional time limit for stopping training. The training will stop at the next checkpoint after reaching the time limit.

[1.18.105]

Added

  • Added support for a possibility to have a custom metrics logger - a function passed as an extra parameter. If supplied, the logger is called during training.

[1.18.104]

Changed

[1.18.103]

Added

  • Added ability to score image-sentence pairs by extending the scoring feature originally implemented for machine
    translation to the image captioning module.

[1.18.102]

Fixed

  • Fixed loading of more than 10 source vocabulary files to be in the right, numerical order.

[1.18.101]

Changed

  • Update to Sacrebleu 1.3.6

[1.18.100]

Fixed

  • Always initializing the multiprocessing context. This should fix issues observed when running sockeye-train.

[1.18.99]

Changed

[1.18.98]

Changed

  • Converted several transformer-related layer implementations to Gluon HybridBlocks. No functional change.

1.18.97

07 May 14:07
2d458b2
Compare
Choose a tag to compare

[1.18.97]

Changed

  • Updated to PyYAML 5.1

[1.18.96]

Changed

  • Extracted prepare vocab functionality in the build vocab step into its own function. This matches the pattern in prepare data and train where the main() function only has argparsing, and it invokes a separate function to do the work. This is to allow modules that import this one to circumvent the command line.

[1.18.95]

Changed

  • Removed custom operators from transformer models and replaced them with symbolic operators.
    Improves Performance.

[1.18.94]

Added

  • Added ability to accumulate gradients over multiple batches (--update-interval). This allows simulation of large
    batch sizes on environments with limited memory. For example: training with --batch-size 4096 --update-interval 2
    should be close to training with --batch-size 8192 at smaller memory footprint.

[1.18.93]

Fixed

  • Made brevity_penalty argument in Translator class optional to ensure backwards compatibility.

1.18.92

16 Apr 14:03
Compare
Choose a tag to compare

[1.18.92]

Added

  • Added sentence length (and length ratio) prediction to be able to discourage hypotheses that are too short at inference time. Can be enabled for training with --length-task and with --brevity-penalty-type during inference.

[1.18.91]

Changed

  • Multiple lexicons can now be specified with the --restrict-lexicon option:
    • For a single lexicon: --restrict-lexicon /path/to/lexicon.
    • For multiple lexicons: --restrict-lexicon key1:/path/to/lexicon1 key2:/path/to/lexicon2 ....
    • Use --json-input to specify the lexicon to use for each input, ex: {"text": "some input string", "restrict_lexicon": "key1"}.

[1.18.90]

Changed

  • Updated to MXNet 1.4.0
  • Integration tests no longer check for equivalence of outputs with batch size 2

[1.18.89]

Fixed

  • Made the length ratios per bucket change backwards compatible.

[1.18.88]

Changed

  • Made sacrebleu a pip dependency and removed it from sockeye_contrib.

[1.18.87]

Added

  • Data statistics at training time now compute mean and standard deviation of length ratios per bucket.
    This information is stored in the model's config, but not used at the moment.

[1.18.86]

Added

  • Added the --fixed-param-strategy option that allows fixing various model parameters during training via named strategies.
    These include some of the simpler combinations from Wuebker et al. (2018) such as fixing everything except the first and last layers of the encoder and decoder (all_except_outer_layers). See the help message for a full list of strategies.

1.18.85

15 Mar 14:07
c1b1da8
Compare
Choose a tag to compare

[1.18.85]

Changed

  • Disabled dynamic batching for Translator.translate() by default due to increased memory usage. The default is to
    fill-up batches to Translator.max_batch_size.
    Dynamic batching can still be enabled if fill_up_batches is set to False.

Added

  • Added parameter to force training to stop after a given number of checkpoints. Useful when forced to share limited GPU resources.

[1.18.84]

Fixed

  • Fixed lexical constraints bugs that broke batching and caused large drop in BLEU.
    These were introduced with sampling (1.18.64).

[1.18.83]

Changed

  • The embedding size is automatically adjusted to the Transformer model size in case it is not specified on the command line.

[1.18.82]

Fixed

  • Fixed type conversion in metrics file reading introduced in 1.18.79.

[1.18.81]

Fixed

  • Making sure the training pickled training state contains the checkpoint decoder's BLEU score of the last checkpoint.

[1.18.80]

Fixed

  • Fixed a bug introduced in 1.18.77 where blank lines in the training data resulted in failure.

[1.18.79]

Added

  • Writing of the convergence/divergence status to the metrics file and guarding against numpy.histogram's errors for NaNs during divergent behaviour.

1.18.78

24 Feb 14:44
86b8175
Compare
Choose a tag to compare

[1.18.78]

Changed

  • Dynamic batch sizes: Translator.translate() will adjust batch size in beam search to the actual number of inputs without using padding.

[1.18.77]

Added

  • sockeye.score now loads data on demand and doesn't skip any input lines

[1.18.76]

Changed

  • Do not compare scores from translation and scoring in integration tests.

Added

  • Adding the option via the flag --stop-training-on-decoder-failure to stop training in case the checkpoint decoder dies (e.g. because there is not enough memory).
    In case this is turned on a checkpoint decoder is launched right when training starts in order to fail as early as possible.

[1.18.75]

Changed

  • Do not create dropout layers for inference models for performance reasons.

[1.18.74]

Changed

  • Revert change in 1.18.72 as no memory saving could be observed.

[1.18.73]

Fixed

  • Fixed a bug where source-factors-num-embed was not correctly adjusted to num-embed
    when using prepared data & source-factor-combine sum.