Use readr::read_delim as a parser for httr::content()? #239

jennybc · 2015-05-23T21:47:55Z

There's a nice consistent thing going on in readr and readxl re: col_names and col_types. And read_delim() has lots of other useful features, e.g. not converting character to factor. It would be great to exploit all of that when parsing the content of requests in httr. Do you have any plans to switch over to readr::read_delim() when MIME type is text/csv or text/tab-separated-values?

Context: I want to bring Google Sheets into R in a way that's as compatible as possible with those other packages, with minimal fuss under the hood. If text/csv content was parsed with readr::read_csv(), that would go a long way. (I realize I can do this already by setting as = "text" and parsing myself with read_csv().)

The text was updated successfully, but these errors were encountered:

hadley · 2015-05-25T13:20:03Z

Yes - I also need to replace XML with xml2. (Although that's a bigger change because it's much more likely to break existing code and hence needs a big version bump and longer release announcement period)

jennybc · 2015-05-25T17:07:48Z

You seem to be implicitly creating a standard for bringing (tabular) data into a data.frame-flavored object. Is there an actual one written down somewhere or lurking in your head? In googlesheets, I must either parse content with MIME type "text/csv" or fish stuff out of XML or JSON and marshal with dplyr::data_frame_(). In either case, I am (weakly) trying to match behaviour of readxl::read_excel() and the capabilities of read.table(). I imagine other folks are also trying to write packages that create the "least surprise" for users of dplyr, readr, and readxl.

These seem to be key parts of the data ingest contract (paraphrasing from various docs and READMEs):

Don't coerce inputs, e.g. stringsAsFactors = FALSE goddamit.
Don't set row.names.
Don't munge column names (this one is less of a no-brainer for me than the others).
Recognize dates and date-times, at least for a reasonable default format. and always convert to POSIXct?
col_names is the new and improved header = TRUE. Either logical indicating that first row gives variable names or character vector of names.
col_types is the new and improved colClasses, which uses a neat system of integer codes or, more generally, "collectors".
something about encoding as UTF-8?
something about empty rows/columns?
Add tbl_df class to output.

Stuff under discussion that I'm especially interested in:

ability to provide a character vector of strings which are to be interpreted as NA values, a la na.strings from read.table() (see Feature request: allow multiple values for na tidyverse/readxl#13, Multiple strings for missing values tidyverse/readr#125)
a more flexible way to skip lines than skip … maybe an extension of the trusty old comment.char from read.table()? (see Ignore comments in tokenizer tidyverse/readr#68, skip lines after the header tidyverse/readr#179, add comment_char argument to skip lines beginning with a comment marker? tidyverse/readr#167, add skip_footer tidyverse/readr#88, Implement read_nonmem tidyverse/readr#67)

Not sure httr was the right place to put this now … 😞

hadley · 2015-05-26T12:08:26Z

Yeah, that's everything I can think of. I've included some of these points in a recent slide deck, but there's nothing official in the documentation anywhere (not least because it's not clear where it should go)

hadley closed this as completed in b529686 Dec 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use readr::read_delim as a parser for httr::content()? #239

Use readr::read_delim as a parser for httr::content()? #239

jennybc commented May 23, 2015

hadley commented May 25, 2015

jennybc commented May 25, 2015

hadley commented May 26, 2015

Use readr::read_delim as a parser for httr::content()? #239

Use readr::read_delim as a parser for httr::content()? #239

Comments

jennybc commented May 23, 2015

hadley commented May 25, 2015

jennybc commented May 25, 2015

hadley commented May 26, 2015