Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use readr::read_delim as a parser for httr::content()? #239

Closed
jennybc opened this issue May 23, 2015 · 3 comments
Closed

Use readr::read_delim as a parser for httr::content()? #239

jennybc opened this issue May 23, 2015 · 3 comments

Comments

@jennybc
Copy link
Member

jennybc commented May 23, 2015

There's a nice consistent thing going on in readr and readxl re: col_names and col_types. And read_delim() has lots of other useful features, e.g. not converting character to factor. It would be great to exploit all of that when parsing the content of requests in httr. Do you have any plans to switch over to readr::read_delim() when MIME type is text/csv or text/tab-separated-values?

Context: I want to bring Google Sheets into R in a way that's as compatible as possible with those other packages, with minimal fuss under the hood. If text/csv content was parsed with readr::read_csv(), that would go a long way. (I realize I can do this already by setting as = "text" and parsing myself with read_csv().)

@hadley
Copy link
Member

hadley commented May 25, 2015

Yes - I also need to replace XML with xml2. (Although that's a bigger change because it's much more likely to break existing code and hence needs a big version bump and longer release announcement period)

@jennybc
Copy link
Member Author

jennybc commented May 25, 2015

You seem to be implicitly creating a standard for bringing (tabular) data into a data.frame-flavored object. Is there an actual one written down somewhere or lurking in your head? In googlesheets, I must either parse content with MIME type "text/csv" or fish stuff out of XML or JSON and marshal with dplyr::data_frame_(). In either case, I am (weakly) trying to match behaviour of readxl::read_excel() and the capabilities of read.table(). I imagine other folks are also trying to write packages that create the "least surprise" for users of dplyr, readr, and readxl.

These seem to be key parts of the data ingest contract (paraphrasing from various docs and READMEs):

  • Don't coerce inputs, e.g. stringsAsFactors = FALSE goddamit.
  • Don't set row.names.
  • Don't munge column names (this one is less of a no-brainer for me than the others).
  • Recognize dates and date-times, at least for a reasonable default format. and always convert to POSIXct?
  • col_names is the new and improved header = TRUE. Either logical indicating that first row gives variable names or character vector of names.
  • col_types is the new and improved colClasses, which uses a neat system of integer codes or, more generally, "collectors".
  • something about encoding as UTF-8?
  • something about empty rows/columns?
  • Add tbl_df class to output.

Stuff under discussion that I'm especially interested in:

Not sure httr was the right place to put this now … 😞

@hadley
Copy link
Member

hadley commented May 26, 2015

Yeah, that's everything I can think of. I've included some of these points in a recent slide deck, but there's nothing official in the documentation anywhere (not least because it's not clear where it should go)

@hadley hadley closed this as completed in b529686 Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants