Skip to content

Getting started: using the new features of MIGraphX 0.4

mvermeulen edited this page Aug 26, 2019 · 26 revisions

New Features in MIGraphX 0.4

MIGraphX 0.4 supports the following new features:

  • Quantization support for fp16 and int8
  • Support for NLP models, particularly BERT with both Tensorflow and ONNX examples

This page provides examples and pointers of how to use these new features.

Quantization

Release 0.4 adds support for INT8 quantization as well as FP16 previously introduced in release 0.3. One aspect that int8 quantization differs from fp16 is that MIGraphX needs to determine "scale factors" to convert between fp32 and int8 values. There are two methods of determining such scale factors:

  • MIGraphX has built-in heuristics to pick factors or
  • MIGraphX quantization int8 quantization functions can accept as input a set of "calibration data". The model is run with this calibration data and scale factors are determined by measuring intermediate inputs. The format of the quantization data is the same as data later used for evaluation

The APIs MIGraphX provides for quantization have been updated to the following:

...to be added...

BERT, natural language processing (NLP) model

Release 0.4 includes improvements so that MIGraphX can optimize the BERT NLP model. Cookbook examples are included for both ONNX and Tensorflow frozen graphs. These examples are based on the following repositories:

Description: ONNX BERT model based on pytorch-transformers repository

Start by creating an ONNX file saved from pytorch-pretrained-transformers repository The first part here is to get sources of the pytorch repository

prompt% git clone https://github.com/huggingface/pytorch-transformers

The next step is to modify the pytorch/transformers/examples/run_glue.py script to dump an ONNX file after the training step completes. We do this by adding the following code

with torch.no_grad():
   model.eval()
   torch.onnx.export(model(batch[0],batch[1],batch[2]),
                     'bert_'+args.task_name.lower()+str(args.eval_batch_size)+'.onnx',verbose=True)

immediately following the code that says

if args.output_mode == "classification":
   preds = np.argmax(preds, axis=1)
elif args.output_mode == "regression":
   preds = np.squeeze(preds)

The torch.onnx.export call creates an ONNX model that expects the following three inputs of length sequence_mask:

  • input_ids - sequence of tokens
  • input_mask - 1 when means the corresponding input_id is valid, 0 means not valid
  • segment_ids - 0 means part of the first segment and 1 means part of the second segement
  1. Processing the GLUE benchmark files into format expected by the run_glue.py (and ONNX model) involves multiple steps as summarized by the following example for MRPC

MRPC training data file starts with a set of tab-separated lines where following is the first data line. The first field ("1") suggests the two sentences are paraphrases. The fourth field (starting with "He said...") is the first sentence. The fifth field (starting with "The foodservice...") is the second sentence.

1       1355540 1355592 He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .    " The foodservice pie business does not fit our long-term growth strategy .

The first and second sentences are tokenized, looking up each token in the BERT token list. This results in the following tokens for the first sentence

1124,1163,1103,11785,1200,14301,16288,1671,2144,112,189,4218,1103,1419,112,188,1263,118,1858,3213,5564,119

and the following tokens for the second sentence

107,1109,11785,1200,14301,16288,1671,1674,1136,4218,1412,1263,118,1858,3213,5564,119

This combination results in the following values to be assigned to the ONNX model inputs:

  • input_ids - is assigned a [CLS] token, tokens for the first sentence, a [SEP] token, tokens for the second sentence and a [SEP] token.
  • input_mask - is assigned '1' values corresponding to the number of tokens (42) in input_ids, followed by 0 values.
  • segment_ids - is assigned 24 0 values for the first sentence, followed by 18 '1' values for the second sentence.