Predictions are horribly wrong #145

davidniki02 · 2018-06-28T23:02:04Z

I have trained magpie on a news dataset. I have 9 labels for my data.

I training the model and tested the following text using magpie.predict_from_text():

Más de 690 mil casos de inmigrantes esperan ser resueltos por tribunales de Inmigración WASHINGTON— La Administración Trump ha convertido las protecciones de menores en sinónimo de “lagunas legales” que el Congreso debe eliminar pero mientras tanto, sobre el terreno, tampoco ha mejorado el atasco de más de 692,000 casos pendientes en los tribunales de Inmigración, según expertos.

While I don't have ANY Spanish documents in my training samples, magpie returns a 90% chance that this text belongs to one of my labels! It even predicts similar results for 3 other categories, all of them irrelevant. I even tried to see if there are any words that are causing this, but could not find any.

What can be wrong here? I trained the data on 400-500 documents for each category, and set epochs to 30 as well as 50 (no change in results)

jstypka · 2018-07-02T19:42:04Z

Well, if you didn't feed it any Spanish text before, the network will return random result. In order for the network to build representations for words (in any language) they need to appear in the training set at least N times (N=5 by default). Otherwise Magpie just has no idea what is being fed into it and might be triggered by random noise like "Washington" or "Trump" in your case.

The rule is - you should test/predict on the same type of data as you train.

davidniki02 · 2018-07-02T20:18:52Z

The thing that worries me is the high confidence - 95% in some cases. If it does not recognize the words, should it not at least be careful about its predictions?

shashi-netra · 2018-07-29T13:11:01Z

I have the same issue, and have these poor results even if I use some part of the training corpus to test.

shashi-netra · 2018-08-02T15:51:42Z

https://github.com/inspirehep/magpie/issues/149

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions are horribly wrong #145

Predictions are horribly wrong #145

davidniki02 commented Jun 28, 2018

jstypka commented Jul 2, 2018

davidniki02 commented Jul 2, 2018

shashi-netra commented Jul 29, 2018

shashi-netra commented Aug 2, 2018

Predictions are horribly wrong #145

Predictions are horribly wrong #145

Comments

davidniki02 commented Jun 28, 2018

jstypka commented Jul 2, 2018

davidniki02 commented Jul 2, 2018

shashi-netra commented Jul 29, 2018

shashi-netra commented Aug 2, 2018