Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Keep all the Curse Words #23

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HouseholdVTuber
Copy link

Hi there,
I uncensored the words in ..schrute\Python\office_transcript.csv, and then ran ..schrute\data-raw\get_data.R to save the updated contents to ..schrute\data\theoffice.rda.

For the most part, sex/bastard/asshole/etc. were censored with asterisks, but sex organs(penis/vagina) were censored with l33tspeak like pen1s and vg1n. I replaced those too.

…nd exported changed content to theoffice.rda. Both words censored with asterisks and censored with l33tspeak were restored.
@HouseholdVTuber
Copy link
Author

I made this change assuming that the transcript of The Office will, well, never change so there would probably be no need to run get_transcript.py. But that script is part of the repo and could theoretically be run again so I'm going to go ahead and modify it to replace the known curse words (s*x, etc.) after they're scraped from the URLs and before they're written to the .csv

@bradlindblad
Copy link
Owner

@HouseholdVTuber let me know when you make those changes. I agree that find-replace in the CSV isn't probably best practice.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants