-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional anchor and ncol #8
Comments
Shouldn't be too hard to translate (e.g.) |
@zentree would you prefer a specification like this? |
I'd second this, both the functionality of being able to pull out a rectangular selection and the ability to specify that selection with Excel-like references. I'm currently using XLConnect's |
Specifying the number of rows should not be a must, as you typically don't know the number of rows, |
I would go with the "A1:Z31" syntax. Typically an excel user will be referencing the data cells anyway in the sheet so they'll be thinking in those terms. You might also want to have it skip the "$" notation in excel, so that "A$1:$B$32" returns the same cell range as A1:B32, just because the user might be copying the reference from an excel sheet. . |
Rectangular specification would be a nice way to deal with number of rows too. From a user point of view I have to deal with two cases:
I see a notation like A1:Z32 dealing mostly with case 2. |
Excellent new package; so is readr. I'd use two extra options a lot:
|
I'm doing exactly this over in googlesheets, which consumes (and writes) data from (to) Google Spreadsheets. For the targeted data consumption discussed here, the Sheets API forces you to use a cell-by-cell approach (the incredibly slow "cell feed"). In this case, my function lets user specify the data rectangle via min row, max row, min col, max col. There are convenience wrappers to get one or more rows, one or more columns, or a region specified like B3:G17 or R3C2:R17C7 (the other standard positioning notation). Then that output can be processed by one of two functions for reshaping and/or transformation, where there is an argument for header = TRUE/FALSE.(Hmmm... I should probably write even more wrappers to package that sequence of actions.) https://github.com/jennybc/googlesheets/blob/master/R/consume-data.R I'm really interested in this thread because it would be great to keep the two interfaces as similar as possible. I even volunteer to help! |
@jennybc I'd love to incorporate your cell specification code in readxl. I think bundling all the options into one argument plus helper functions would be ideal. Something like this: read_excel(..., range = "D12:F15")
read_excel(..., range = rc("R1C12:R6C15"))
read_excel(..., range = excel_range(c(1, 6), c(1, 15))
read_excel(..., range = excel_range(c(2, NA), c(1, NA)) |
OK I will isolate and upgrade cell specification over in googlesheets and get back to you. The helper functions are nice idea.
|
The "B7:Z18" syntax sounds great! Just please make sure it can handle more than 26 columns. |
@wdkrnls What is the max number of columns in Excel? In Google sheets, it's only 300. It is helpful to know this upper bound for the function that translates columns in "ABDC" positioning notation to actual column numbers. |
According to this source excel 2010 is limited to 16384 columns. |
My excel sheets regularly have hundreds of columns by the time I get them. I don't have exact statistics, but my data come from arrays of micro reactors hooked up to online chromatography and have between 500-1000 columns covering concentrations of myriads of detectable chemicals. |
@hadley readxl currently has no dependencies other than Rcpp. Would you want that to be true of the range specifying stuff as well, i.e. it's part of the package philosophy? |
I'd prefer to keep dependencies minimal, but it seems like this code needs to be shared between readxl and spreadr |
I'll try to do it that way then. And look wistfully at stringr and |
In a branch, I isolated the the cell specification functions and refactored them to have no dependencies: https://github.com/jennybc/googlesheets/blob/cell-range-specification/R/cell-specification.R It's basically 3 pairs of functions:
I wanted to check back in before going further. Hopefully this get things rolling, to settle on user-facing bits. I don't export any of the above. Tests are here: |
Would I be able to use excel_range to select all columns to the right? e.g. excel_range(c(15, NA), c(NA, NA)) This feature would be very useful for reading and cleaning the output of one of the data sources I have. |
@wdkrnls Yes that's my interpretation of the proposal and what the NAs would mean. However, I would interpret your example as requesting to read from row 15 on, not column. I'd assume this:
where NA for a max means no limit and NA for a min is equivalent to specifying the value as 1. |
Fair enough, I was imagining specifying the corners of the box as c(column, row) pairs analogous to the excel syntax, but your interpretation sounds great to me. |
@hadley Let me specify where I need your guidance. Ways to specify cells. From most Excel-ish and least programmatically useful to least Excel-ish and most programmatically useful:
Internally, my current manipulations always move things down that list above. But the way I read your proposed helper functions:
… they seem to imply transformations up the list, i.e. towards the Should I just get over it and proceed anyway? I realize there is no concern about speed or anything. Is it overkill to have an S3 class for a cell range that holds the specifications in all the formats, so it's always easy to print or use the one best for the task at hand? Do you approve of row or column-based helpers:
|
Altogether I think it would look something like this: excel_range <- function(rows, col = cols) {
stopifnot(is.numeric(rows), length(rows) == 2)
stopifnot(is.numeric(cols), length(cols) == 2)
structure(list(rows = rows, cols = cols), class = "excel_range")
}
rc <- function(x) {
...
excel_range(rows, cols)
}
as.excel_range <- function(x) UseMethod("as.excel_range")
as.excel_range.excel_range <- function(x) x
as.excel_range.character <- function(x) rc(x) and then Row and column based helpers would be fine (although I'd worry a little bit about giving them such short names - generally better to preserve short names for widely applicable tools) |
Perfect. I will push it forward and report back. |
I wrote the parts to process cell specification by the user. This is still in a branch for me, but there I am using these functions myself. FWIW all is well re: tests and travis. All the cell specification stuff that could be in common is in this file and there are no package dependencies: https://github.com/jennybc/googlesheets/blob/cell-range-specification/R/cell-specification.R Tests are here: I renamed the class from Haven't added any row or column helpers yet. This should put you in a position to modify
|
I should probably implement @gshotwell's suggestion to ignore |
Looks good. I agree you should just ignore Next step is to figure out where to put this? Maybe we should have a small cell tools package? |
I can put this in a little package. Call it … I'll ignore |
Cell ranges sounds good to me |
Really excited to try this out! |
I've put this stuff in a package, |
You forgot to remove the last 'e' ;) it's convention! |
The helper package is on CRAN now: http://cran.r-project.org/web/packages/cellranger/index.html |
Thanks so much for this! I'm really looking forward to never typing |
Could this be extended to support the following?
|
What's the advantage of that versus the already existing |
It mirrors the representation that Excel uses for ranges, e.g., for named ranges (#79). Not a big deal, but wanted to throw it out there. |
It's reasonable to think about expanding the notion of a I'm not exactly eager to rework the handling of worksheets … but it's worth considering. |
Bump: is there a way to access this functionality yet? Sorry if it's been implemented and I missed it, but I claim it's not obvious ... My use case is just to be able to select a limited number of rows. The spreadsheet has header rows interspersed with data (sigh): if I follow the advice to read the whole file (possibly skipping initial rows) and discard the stuff I don't need, then I have the hassle of converting columns back from character to numeric ... |
@bbolker Not yet, but this package is now a main focus of mine. Once I'm done with basic triage of the issues/pull requests, this feature will be a high priority. Closing the longest-running issue will be a real pleasure. |
It would be nice to be able to specify where the rectangular data part is located within a sheet (some people like to have "non-data" related things in their sheets too). One way could be to specify the top-left anchor cell, say "D5", and optionally the number of columns to use.
The text was updated successfully, but these errors were encountered: