Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

News_Category_Dataset_v2 #188

Open
davidniki02 opened this issue Jun 15, 2019 · 1 comment
Open

News_Category_Dataset_v2 #188

davidniki02 opened this issue Jun 15, 2019 · 1 comment

Comments

@davidniki02
Copy link

I'm trying magpie on the News_Category_Dataset_v2.json which contains 200,000 news from Huffpost with their categories. Though I got an 81% accuracy during training, the actual predictions are very off, i.e. it barely ever reaches above 50% on any of its predictions, and most of the time it predicts the "politics" category even though the text is purely "technical".

Do we need a bigger sample to train on, or is there something wrong?

@bigredbug47
Copy link

Hi @davidniki02 ,

Many aspects that can affect the accuracy, I think 200,000 news is enough for training the model, maybe your problem is come from your dataset.

Did you analyze the dataset's distribution - does the dataset balanced or not? What is the top frequency represent words in each categories? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants