Some doubts on the datasets #1

wujsAct · 2018-03-01T06:10:28Z

KNET is an excellent work and is very useful for many applications.
Recently, I follow your AAAI 2018 paper and download this code.
I found that context sequences in valid_context.npy are out-of-order in valid_context.npy , test_context.npy
and train_context,npy. So that it may impossible for us to reuse this data.
On the other hand, the left and right context sequences length are separately 15?

ji-xin · 2018-03-01T07:22:05Z

*_context.npy files are organized in the following way.
For a sentence

...a5, a4, a3, a2, a1, ENTITY WORDS, b1, b2, b3, b4, b5, ...

it's stored as [a1, b1, a2, b2, ..., a15, b15].

And yes, context on both side has a window of 15 words, which sometimes even goes beyond the sentence boundary. But if there is not enough words, say, at the beginning of a paragraph, unk will be used as paddings.

Does this solve your doubts?

wujsAct · 2018-03-01T07:32:32Z

Thanks for your explanation.

ji-xin closed this as completed Mar 1, 2018

ji-xin mentioned this issue Mar 28, 2018

Preparing new data for testing #2

Closed

ji-xin mentioned this issue Sep 19, 2018

关于train_context文件的疑问 #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some doubts on the datasets #1

Some doubts on the datasets #1

wujsAct commented Mar 1, 2018

ji-xin commented Mar 1, 2018

wujsAct commented Mar 1, 2018

Some doubts on the datasets #1

Some doubts on the datasets #1

Comments

wujsAct commented Mar 1, 2018

ji-xin commented Mar 1, 2018

wujsAct commented Mar 1, 2018