This is free to use, so do what ever you want.
The Translator use a beam search for the best sentence.
This Translator should also works with other architecture such as sequence to sequence model.
This translator works only with 1 sentence as input, since for each of top k sentences we want to focus to predict the next top k words for each of this sentence, we generate a batch size of k.
what you also can do is, to use tf.TensorArray to write the sentences tokens, this could boost the performance ( I did not test it).
Let
the probability to get the first sentence is
and the probability to get the second sentence is
Note that the two sentence can have different length.
We search the sentence with the highest probability
we can formulate this as the sum, since the product can get very small
def _compute_sum_log_value(new_sentence,sentence_vec,end_token,alpha=0.7):
"""
args:
new_sentence : a tuple of token represents the predicted sentence
sentence_vec : the value compute from the model from the new_sentence
end_token : int, represents the end token for example '3'
compute 1/T^\alpha sum log(P(y_i|x,y_1,...,y_i-1))
mask: is the true long of the sentence
so that the sentence ends with the first end_token
"""
mask = _create_prediction_mask(new_sentence,end_token)
mask = tf.cast(mask,dtype=tf.float32)
sentence_value = tf.math.log(tf.math.multiply(sentence_vec,mask))
return tf.math.reduce_sum(sentence_value,axis=-1)/tf.math.pow(
tf.math.reduce_sum(mask,axis=-1),alpha) # normalize by length^alpha
In order to compute the probability for each sentence, we need to compute the mask for each sentence, so that this mask will have value 0 where the sentence ended.
def _create_prediction_mask(pred,end_token):
#first part
mask_end_token = tf.cast(tf.math.equal(pred,end_token),dtype=tf.int16)
mask_end_token_sum = tf.math.cumsum(mask_end_token,axis=-1) # following end will have value greater than 1
mask_end_token_sum = tf.cast(tf.where(mask_end_token_sum==1,1,0),dtype=tf.bool)
mask_end_token = tf.cast(mask_end_token,dtype=tf.bool)
# second part
mask_end_token = tf.math.logical_and(mask_end_token,mask_end_token_sum)
#third part
mask_not_end = tf.cast(tf.math.logical_not(mask_end_token),dtype=tf.float32)
mask_not_end = tf.cast(tf.math.cumprod(mask_not_end,axis=-1),tf.bool)
mask = tf.math.logical_or(mask_not_end, mask_end_token)
return mask
what the first part do is it create a mask for pred
mask_end_token_sum: end token will be mask with
mask_end_token: will mask with 1 if it is an end token else 0
The second part will create a mask for the first end token all other will be 0.
The third part will create a mask for pred, where the mask will be 1 until the first end token reach. All afterwards will be 0.