Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toxicity Measurement #262

Merged
merged 42 commits into from
Aug 24, 2022
Merged

Toxicity Measurement #262

merged 42 commits into from
Aug 24, 2022

Conversation

sashavor
Copy link
Contributor

Initial draft of the toxicity metric -- would love your thoughts, @mathemakitten and @lvwerra !

@sashavor sashavor requested a review from lvwerra August 18, 2022 19:53
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! The PR is in pretty good shape already! Mostly added some comments about efficiently loading the pipeline.

Just looking at the functionality this seems to me also a case where it is not so clear why this shouldn't be a measurement (essentially you look at text files and it doesn't matter so much whether they are generated or human written).

metrics/toxicity/toxicity.py Outdated Show resolved Hide resolved
metrics/toxicity/requirements.txt Outdated Show resolved Hide resolved
Sasha Luccioni and others added 3 commits August 19, 2022 08:24
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Sasha Luccioni and others added 3 commits August 19, 2022 09:14
@sashavor sashavor changed the title Toxicity Toxicity Measurement Aug 19, 2022
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved

Args:
`predictions` (list of str): prediction/candidate sentences
`toxic_label` (optional): the toxic label that you want to detect, depending on the labels that the model has been trained on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that type is str. Should we specify that right now we only allow for one label here? Toxicity is often a multi-class prediction problem wrt toxicity along several axes (e.g. identity-based hate vs. racism) but right now we only handle one class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how would you aggregate the results across different labels?
e.g. if you have
{'offensive': 0.65, 'hate': 0.98}, then what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, aggregation would be a bit tricky. I think the Perspective API (as an example) reports these results back unaggregated across categories — there's an individual score for each category of 'identity hate', 'toxicity', 'sexism', 'racism', 'sexually explicit' etc. and they don't aggregate across categories.

I assume the idea is that as an end user of a toxicity API you'd want to handle cases of sexually explicit content differently than identity-based hate, so the granularity is helpful/necessary. In this case an equivalent process would be to not aggregate when there are several types of toxicity specified, and report back per-toxicity-class (e.g. "toxic_labels" is a list instead of a str). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I can only find binary hate speech classification models on the Hub, so maybe we keep it like this for now?

Sasha Luccioni and others added 9 commits August 22, 2022 10:00
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
updating examples
Co-authored-by: helen <31600291+mathemakitten@users.noreply.github.com>
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left mostly nits! I think there is an issue with the docstring based on the CI error. Also if you merge main into your branch the timeout issue of the CI should not be there anymore.

measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Show resolved Hide resolved
@lvwerra
Copy link
Member

lvwerra commented Aug 23, 2022

From the CI:

Toxicity has inconsistent leading whitespace: '    `aggregation` (optional): determines the type of aggregation performed on the data. If set to `None`, the scores for each prediction are returned.'

sashavor and others added 12 commits August 23, 2022 08:58
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this - added a few final remarks suggestions. Then it's good to go :)

measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/README.md Outdated Show resolved Hide resolved
measurements/toxicity/toxicity.py Outdated Show resolved Hide resolved
Comment on lines +122 to +123
codebase_urls=[],
reference_urls=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no references, code on github we can reference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, there is just the dataset that the toxicity model was trained on? https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset

Not sure if that's helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we wouldn't want to link to the RealToxicityPrompts repo? The classifier is different (Perspective vs. FAIR classifier) but it's the same idea, and RealToxicityPrompts is a canonical citation for the toxicity metric in the past few years.

https://github.com/allenai/real-toxicity-prompts/

Sasha Luccioni and others added 6 commits August 24, 2022 11:59
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
@sashavor sashavor merged commit e78e709 into main Aug 24, 2022
@sashavor sashavor deleted the toxicity branch August 24, 2022 16:30
@sashavor
Copy link
Contributor Author

sashavor commented Aug 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants