Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubts on the datasets #1

Closed
wujsAct opened this issue Mar 1, 2018 · 2 comments
Closed

Some doubts on the datasets #1

wujsAct opened this issue Mar 1, 2018 · 2 comments

Comments

@wujsAct
Copy link

wujsAct commented Mar 1, 2018

KNET is an excellent work and is very useful for many applications.
Recently, I follow your AAAI 2018 paper and download this code.
I found that context sequences in valid_context.npy are out-of-order in valid_context.npy , test_context.npy
and train_context,npy. So that it may impossible for us to reuse this data.
On the other hand, the left and right context sequences length are separately 15?

@ji-xin
Copy link
Collaborator

ji-xin commented Mar 1, 2018

*_context.npy files are organized in the following way.
For a sentence

...a5, a4, a3, a2, a1, ENTITY WORDS, b1, b2, b3, b4, b5, ...

it's stored as [a1, b1, a2, b2, ..., a15, b15].

And yes, context on both side has a window of 15 words, which sometimes even goes beyond the sentence boundary. But if there is not enough words, say, at the beginning of a paragraph, unk will be used as paddings.

Does this solve your doubts?

@wujsAct
Copy link
Author

wujsAct commented Mar 1, 2018

Thanks for your explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants