Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

01_litigations_lead.Rmd Review #2

Open
romartinez-nycc opened this issue Apr 24, 2023 · 0 comments
Open

01_litigations_lead.Rmd Review #2

romartinez-nycc opened this issue Apr 24, 2023 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@romartinez-nycc
Copy link
Contributor

Some of these Nick commented on as well, sorry for the duplication!

  • This works for me source("../code/utils/00_load_dependencies.R") instead of
    setwd("/Users/rhirota/Documents/GitHub/lead_hearing")
    source("code/utils/00_load_dependencies.R")
  • Option to use vroom(). Vroom library has faster reads than fread, if the data is really large, I recommend using vroom & then converting it into datatable.
    litigations <-fread("https://data.cityofnewyork.us/resource/59kj-x8nc.csv?$limit=99999999999")
  • Please comment the code here for non-datatable folks
    litigations[, yr := year(caseopendate)]
    litigations <- litigations[, yr := year(caseopendate)]
    litigations <- litigations[litigations$yr>2005,] # not complete data
    ```
    ```{r}
    # ---- CLEAN DATA ----
    lead_litigations <- unique(litigations[grep("lead", casetype, ignore.case = TRUE), ])
    lead_litigations <- unique(lead_litigations[!grep("non-lead", casetype, ignore.case = TRUE), ])
    lead_litigations[, lit_bbl := length(unique(litigationid)), by = .(bbl)]
  • I added dplyr as a required library in load_dependcies to run the following:
    temp <- lead_litigations %>% group_by(yr) %>% summarise(count_lead=n())
    temp2 <- litigations %>% group_by(yr) %>% summarise(count_total=n())
    temp <- temp2 %>% full_join(y = temp, by = 'yr')
    temp <- temp[temp$yr <= 2023 & !is.na(temp$yr) & !is.na(temp$count_lead), ] # not sure why NA/2030 is in there... data entry error?
  • I added ggplot to run the plots below,
    # ---- plot number of lead litigations by year
    I recommend breaking out each plot as its own code chunk and spacing out the lines to increase readability.
  • Lets move this to the bottom or remove (but please save it somewhere else for future work!)
    Possible interest: Open judgement (yes) or harassment found or penalty assigned
@romartinez-nycc romartinez-nycc added the documentation Improvements or additions to documentation label Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants