Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the NELL dataset has much more features than 5414? #2392

Open
ranery opened this issue Apr 12, 2021 · 4 comments
Open

Why the NELL dataset has much more features than 5414? #2392

ranery opened this issue Apr 12, 2021 · 4 comments

Comments

@ranery
Copy link

ranery commented Apr 12, 2021

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

Hi, I use the default dataset loading to load the NELL dataset, but find that the feature dimension is 61278, much larger than the reported 5414, do you have any thought on that issue?

best

@rusty1s
Copy link
Member

rusty1s commented Apr 12, 2021

We follow the experimental setup of GCN, where entity node features are enhanced by a unique one-hot representation for every relation in the dataset. The NELL paragraph in the GCN paper gives additional details.

@ranery
Copy link
Author

ranery commented Apr 12, 2021

Thanks for your timely response!

I tried with the enhanced features but the GCN accuracy was only ~58%, less than the reported 66%, do you have any suggestions for boosting the accuracy to the matched level?

@rusty1s
Copy link
Member

rusty1s commented Apr 13, 2021

Have you modified the hyperparameters for NELL, see here? It also seems hard to reproduce the official reported results.

@ranery
Copy link
Author

ranery commented Apr 17, 2021

Thanks for sharing, yes the training is not stable, actually, the first 20 epochs can achieve 60+% accuracy while further training will drop the accuracy a lot.

Meanwhile, I tried other GCN architecture, i.e., GAT, which can easily achieve comparable or better accuracy than the reported in GCN paper. Is that common?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants