-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toxicity Measurement #262
Toxicity Measurement #262
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! The PR is in pretty good shape already! Mostly added some comments about efficiently loading the pipeline.
Just looking at the functionality this seems to me also a case where it is not so clear why this shouldn't be a measurement (essentially you look at text files and it doesn't matter so much whether they are generated or human written).
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
measurements/toxicity/toxicity.py
Outdated
|
||
Args: | ||
`predictions` (list of str): prediction/candidate sentences | ||
`toxic_label` (optional): the toxic label that you want to detect, depending on the labels that the model has been trained on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that type is str
. Should we specify that right now we only allow for one label here? Toxicity is often a multi-class prediction problem wrt toxicity along several axes (e.g. identity-based hate vs. racism) but right now we only handle one class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and how would you aggregate the results across different labels?
e.g. if you have
{'offensive': 0.65, 'hate': 0.98}, then what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, aggregation would be a bit tricky. I think the Perspective API (as an example) reports these results back unaggregated across categories — there's an individual score for each category of 'identity hate', 'toxicity', 'sexism', 'racism', 'sexually explicit' etc. and they don't aggregate across categories.
I assume the idea is that as an end user of a toxicity API you'd want to handle cases of sexually explicit content differently than identity-based hate, so the granularity is helpful/necessary. In this case an equivalent process would be to not aggregate when there are several types of toxicity specified, and report back per-toxicity-class (e.g. "toxic_labels" is a list instead of a str). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I can only find binary hate speech classification models on the Hub, so maybe we keep it like this for now?
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
updating examples
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left mostly nits! I think there is an issue with the docstring based on the CI error. Also if you merge main into your branch the timeout issue of the CI should not be there anymore.
From the CI: Toxicity has inconsistent leading whitespace: ' `aggregation` (optional): determines the type of aggregation performed on the data. If set to `None`, the scores for each prediction are returned.' |
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this - added a few final remarks suggestions. Then it's good to go :)
codebase_urls=[], | ||
reference_urls=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no references, code on github we can reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, there is just the dataset that the toxicity model was trained on? https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
Not sure if that's helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we wouldn't want to link to the RealToxicityPrompts repo? The classifier is different (Perspective vs. FAIR classifier) but it's the same idea, and RealToxicityPrompts is a canonical citation for the toxicity metric in the past few years.
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
I'm not sure it's super useful, since it's a general toxicity measure used
by lots of other repos, not only real toxicity prompts (who also use a
completely different model + approach)
…On Wed, Aug 24, 2022 at 12:28 PM helen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In measurements/toxicity/toxicity.py
<#262 (comment)>:
> + codebase_urls=[],
+ reference_urls=[],
Is there a reason we wouldn't want to link to the repo corresponding to
the dataset we use here? They use Perspective as opposed to the FAIR model
for scoring, but the data and general idea is the same.
https://github.com/allenai/real-toxicity-prompts
—
Reply to this email directly, view it on GitHub
<#262 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMMIITX3DV7KPUTC6UESO3V2ZERZANCNFSM566LTUCA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Sasha Luccioni, PhD
Postdoctoral Researcher (Mila, Université de Montréal)
Chercheure postdoctorale (Mila, Université de Montréal)
https://www.sashaluccioni.com/
[image: Image result for universite de montreal logo]
|
Initial draft of the toxicity metric -- would love your thoughts, @mathemakitten and @lvwerra !