One question regarding padding #4

Chandrak1907 · 2018-12-15T04:03:42Z

hi,

I see that you are padding the inputs to get equal length of 52. But, it seems padding is applied to only character inputs but not to words.

 # 0-pads all words
 def padding(Sentences):
     maxlen = 52
     for sentence in Sentences:
         char = sentence[2]
         for x in char:
             maxlen = max(maxlen, len(x))
     for i, sentence in enumerate(Sentences):
         Sentences[i][2] = pad_sequences(Sentences[i][2], 52, padding='post')
     return Sentences

Sentences contains below:

         dataset.append([wordIndices, caseIndices, charIndices, labelIndices]) 
     return dataset

I see that you have made batches of inputs with words of equal length. Is this the correct approach?
Can you pls let me know.

The text was updated successfully, but these errors were encountered:

mxhofer · 2018-12-15T09:32:11Z

Hello! That's correct. This is because the convolutional neural net (CNN) processes the padded character vectors of equal length. Alternatively, one could split the CNN input into equal-size batches (e.g. here). Each input batch to the bi-directional LSTM has the same length, depending on how many words there are in a document.

Chandrak1907 · 2018-12-15T14:38:49Z

Thank you for responding. One follow up question.
What was the rationale behind padding characters to a maximum length of 52. There will be 26 upper case letters, 26 lower case letters and other punctuation characters. Can you pls let me know?

mxhofer · 2018-12-15T20:01:17Z

The maximum length was chosen after analyzing word lengths in the documents, such that no words are cut off.

Chandrak1907 · 2018-12-15T20:35:12Z

There is some confusion. In my understanding, padding is applied to only characters not to words.

mxhofer · 2018-12-16T13:47:52Z

Padding is indeed applied to characters. For example, the padded character-level input below is for the word "RECORD". The output of the padding(Sentences) function is a list of documents of a list of words, cases, characters and labels (see output of the createMatrices(sentences, word2Idx, label2Idx, case2Idx, char2Idx) function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One question regarding padding #4

One question regarding padding #4

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 15, 2018

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 15, 2018 •

edited

Loading

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 16, 2018

One question regarding padding #4

One question regarding padding #4

Comments

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 15, 2018

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 15, 2018 • edited Loading

Chandrak1907 commented Dec 15, 2018

mxhofer commented Dec 16, 2018

mxhofer commented Dec 15, 2018 •

edited

Loading