Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Norwegian language support #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

olavski
Copy link

@olavski olavski commented May 27, 2018

Hi!

I've translated the english list to Norwegian and have added about 400 new or variations of existing words.
I hope you can add this list to the project.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 96.471% when pulling dc61b89 on olavski:master into e2c8233 on fnielsen:master.

smak 2
begrensning -2
grenser -1
prosedyre, rettstvist -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comma here. maybe you should erase "prosedyre" and just maintain "rettstvist" on -1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your quick response! I will go through these reviews and with your reccomendation submit a new PR.

Yes, you are right, this should just be "rettstvist".

perfeksjonert 2
fullkommenhet 3
perfeksjonerer 2
peril -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a Norwegian word?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list was translated with Google Translate, and the list I submitted is compiled from 4 lists:

  • words translated to norwegian
  • english words that could not be translated
  • new words I manually added
  • a "stoplist" of words to be removed from the final list

I have left the untranslatable english words in the list as I thought it couldn't harm and sometimes english is mised in with norwegian anyway.

I can resubmit the list without the english words if you prefer.

popularitet 3
positiv 2
positivt 2
eiendomspronomen -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense as a word with sentiment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove 'eiendomspronomen'

positiv 2
positivt 2
eiendomspronomen -2
post traumatisk -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be written in one word in Norwegian

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three variations are commonly used:

  • post traumatisk
  • post-traumatisk
  • posttraumatisk

When classifying text I replace "-" with a space so the keyword "post traumatisk" would catch the first two variations.
I can add 'posttraumatisk' to catch the third variation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine

love 1
lovet 1
løfter 1
reklamere 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can mean "reklamasjon" in Norwegian and might have a negative sentiment. I suppose it is a doubtful translation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this has probably a more negative sentiment. I can remove it for now.

forfremmet 1
fremmer 1
fremme 1
omgå 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have a negative sentiment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

begeistrede 4
utslett -2
ratifisert 2
å nå 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you check this. Is "å" necessary here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one should be ok actually.
"nå" means just "now" and is too broad.
"å nå" means "to reach" and I've double checked in news and twitter search and is generally about reaching targets/milestones, reaching mount everest, reaching the youth.

konsekvenser -2
reprimande -2
irettesettelser -2
repulse -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a Norwegian word?

salutter 2
frelse 2
sarkastisk -2
lagre 2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this word and the next have sentiment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these for now.

skammelig -2
dele 1
delt 1
aksjer 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that in the normal sense this word has usually little sentiment value to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this one

stjal -2
stjålet -2
stoppe -1
stoppe -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This word is repeated twice

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicates was supposed to have been removed. I'll double check the pipeline for the next list export.

stopp -1
traust 2
rett 1
rar -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This word seems to have a wrong sentiment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rar" is maybe a bit different in norwegian than in danish.
In norwegian "rar" means odd/strange, so I think the sentiment is correct.

tard -2
anløpe -2
anløpet -2
oksiderer -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This word should probably have no sentiment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to stoplist

undergraver -2
tapte terreng -2
dårlige -2
dårlige resultater -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems superfluous given the one above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this..

uverdig -2
oppløftende 2
besvær -2
som haster -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "som"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

krenket -2
bryter med -2
vold -3
vold relatert -3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "vold relatert" when "vold" is there?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be one word?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very rarely used so I'll remove it

veletablerte 2
godt fokusert 2
velstelt 2
vel proporsjonert 2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be one word?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

vinne 4
vinner 4
visdom 1
skulle ønske 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't "skulle" be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

fanatiker -2
fanatikere -2
nidkjær 2
abhorred -3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a considerable number of English words below here.

reddes 2
brakseier 2
drømmetur 2
årets 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does "årets" have a sentiment value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"årets" in itself is not positive, but I've found it used more in a positive context than negative.

I was doing SA on news headlines and "årets" is often used when there is a significant event either the first of the year or an annual event like a festival or an award for person of the year.

For these headlines "årets" is really the only word that indicates that the sentiment of the sentence is positive:
"Christian er årets trønder 2017"
"Salah er årets spiller i Premier League"

Other typical uses:
News:
Alexander Rybak vant årets Melodi Grand Prix
Northug slo landslagskollega og tok årets første seier
Lothepus ble kåret til «årets kjendis»
Årets viktigste bok om vår tids viktigste tema
Årets hederspris til Mari Boine
Terje fra Snåsa ble årets Ildsjel
Hvem av disse fortjener tittelen årets bygdeprofil?
Har store forventninger til årets «Hver gang vi møtes»
Hva er årets tv-øyeblikk? Stem på din Gullruten-favoritt her!

Twitter::
"Følg årets viktigste konferanse"
"Årets viktigaste budskap. "
"årets nyhet"
"Det skal i alle fall ikke stå på været når vi drar igang årets littfest"

It can also be used in a negative context:
Drilltropp gikk for sakte - får ikke gå i årets 17. mai-tog
Nedjustert anslag for årets oljeinvesteringer
AaFK gikk på årets første serietap

kåret 2
drømmejobb 2
tjener 1
kommer til oslo 2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this phrase and the 3 below make sense to be given a sentiment value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove these as they are maybe just relevant to my usecase.

As I deal mainly with news and if they write that someone is coming to the city it is generally someone famous and the sentence therefore has a positive sentiment.

sjekk resultatet 1
trosser 2
trosse 2
her er resultatet 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a sentiment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove sjekk resultatet /her er resultatet.
Again this was used with news headlines.

hetere -2
steinkasterne -2
ikke invitert -3
aldri opplevd maken -3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrase does not always seems to be negative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I see now it is also used in a positive context.
I think it is mainly used in negative context, so I can maybe reduce the weight to -1.

ekstraskatt -2
uforståelig -2
skodrama -1
derfor skal ikke -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be included?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

fortviler -2
avvikling -1
ryktene -1
hardporno -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be negative?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for now. As I deal mainly with news, if this is mentioned it is usually in a negative context.

oppdiktet -1
storslått 3
mishandling -2
påbudt, bindende -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two words here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this

apati -2
apokalyptisk -2
be om unnskyldning -1
apologized -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English?

be om unnskyldning -1
apologized -1
beklager -3
apologizing -1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English?

overse -1
ignorert -2
ignorerer -1
jeg vil -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be included?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

påfører -2
innflytelsesrik 2
overtredelse -2
gjøre rasende -2
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just "rasende"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll fix this

Copy link
Owner

@fnielsen fnielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some comments on the individual entries. I hope you might want to go over them?

@olavski
Copy link
Author

olavski commented May 28, 2018

Thank you for taking the time to look through this!

I'm going to make some changes to the pipeline and build process and compile a new list soon.
I will leave out the english words in the new list.
I will also take out words that are too specific for news headlines, that might not work so well with genreal content. Then I can compile my own lists for my specific use case.
I also want to see if I can find and add nynorsk translations for the current bokmål list.

Do you know of any tools to calculate optimized sentiment weights for the keywords? I have different labelled training sets I could test this on.

@fnielsen
Copy link
Owner

If you have a word embedding it is possible to compute another scoring, see the way I have used AFINN in this article: http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/7029/pdf/imm7029.pdf

@fnielsen
Copy link
Owner

@olavski I am wondering if you had time to look at the possible changes?

@olavski
Copy link
Author

olavski commented Jun 23, 2021

Hi @fnielsen
I'm sorry, I must have gotten side-tracked with this :)
This summer I'm swamped with work, but I've put a monthly reminder to do this so I hope it can wait a few more months.

@fnielsen
Copy link
Owner

@olavski Fine! Thanks for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants