Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert readxl from Rcpp to cpp11 #659

Merged
merged 32 commits into from
Aug 6, 2021
Merged

Conversation

sbearrows
Copy link
Contributor

This initial commit contains the first steps for converting from Rcpp to cpp11

Now I'll convert file by file and push each time I've completed a file so we can review.

src/zip.cpp Outdated Show resolved Hide resolved
src/zip.cpp Outdated Show resolved Hide resolved
@sbearrows
Copy link
Contributor Author

I had to edit a few other files to define the macros R_NO_REMAP and STRICT_R_HEADERS to get rid of the error message:

R headers were included before cpp11 headers and at least one of R_NO_REMAP or STRICT_R_HEADERS was not defined.

It was easier then try to figure out how to reorganize the header files.

src/CellLimits.h Outdated Show resolved Hide resolved
Copy link
Contributor

@jimhester jimhester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few cases where as far as I could tell you were converting an R SEXP to another R SEXP using as_cpp(), which you should never need to do.

The mental model is as_cpp() is used to convert a SEXP to a native C++ type (like std::string, int, etc.) and as_sexp() is used to go the opposite way, e.g. convert a C++ type to an R SEXP.

But in general you should have to call these explicitly somewhat rarely, most of the time they can be done implicitly in the generated registration code.

src/XlsxWorkSheet.cpp Outdated Show resolved Hide resolved
@@ -198,7 +204,7 @@ class XlsxWorkSheet {

xcell->inferType(na, trimWs, wb_.stringTable(), dateFormats_);
CellType type = xcell->type();
Rcpp::RObject column = cols[col];
cpp11::sexp column = cpp11::as_sexp(cols[col]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this call to as_sexp()? I would guess no? maybe just this?

Suggested change
cpp11::sexp column = cpp11::as_sexp(cols[col]);
cpp11::sexp column(cols[col]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up having to do something similar to what we did with
std::vector<ColType> colTypes = colTypeStrings(static_cast<SEXP>(col_types));
Because I was getting a constructor error but let me know if there is another solution!


// catches empty sheets and sheets where requested rectangle contains no data
if (ws.nrow() == 0 && ws.ncol() == 0) {
return Rcpp::List(0);
cpp11::list x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be a writable list I think? Otherwise it is going to return NULL instead of list()

Suggested change
cpp11::list x;
cpp11::writable::list();

bool has_col_names = false;
switch(TYPEOF(col_names)) {
case STRSXP:
colNames = as<CharacterVector>(col_names);
colNames = cpp11::as_cpp<cpp11::strings>(col_names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need any cast? just colNames = col_names

@@ -47,7 +48,7 @@ List read_xlsx_(std::string path, int sheet_i,
if (TYPEOF(col_types) != STRSXP) {
Rcpp::stop("`col_types` must be a character vector");
}
std::vector<ColType> colTypes = colTypeStrings(as<CharacterVector>(col_types));
std::vector<ColType> colTypes = colTypeStrings(cpp11::as_cpp<cpp11::strings>(col_types));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you need a cast here either

src/ColSpec.h Outdated Show resolved Hide resolved
src/CellLimits.h Outdated Show resolved Hide resolved
Comment on lines +7 to +8
#include <set>
#include <vector>
Copy link
Member

@jennybc jennybc Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to keep the system includes, e.g. <whatever> together and above the ones defined here, e.g. "utils.h"?

If you agree, that would be a good thing to check globally. I think that is (mostly?) how things started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for reminding me about this.
I rearranged some of the header files to avoid conflicts with Rcpp and cpp11 libraries so I need to reorganize them (or organize them back to how you had them). I ended up using the macros #define R_NO_REMAP and #define STRICT_R_HEADERS instead to avoid the conflicts, which I'll remove since they won't be needed anymore once this PR is closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state seems to be much more regular, but often puts the local headers before the system headers. It feels like a more conventional order would be: system, vendored code (libxls), written-by-us, which also reflects intended precedence. (Let me also acknowledge that this probably doesn't matter and is more a matter of style at this point.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry! I should have reread your comment before executing based on my memory. I'll rearrange in that order.

src/XlsWorkSheet.cpp Outdated Show resolved Hide resolved
src/XlsWorkSheet.h Outdated Show resolved Hide resolved
src/XlsxWorkSheet.cpp Outdated Show resolved Hide resolved
src/XlsxWorkSheet.h Outdated Show resolved Hide resolved
src/StringSet.h Outdated Show resolved Hide resolved
src/ColSpec.h Outdated Show resolved Hide resolved
src/CellLimits.h Outdated Show resolved Hide resolved
Copy link
Contributor

@jimhester jimhester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to remove the rest of the Rcpp references as well. You can find them with find my files or with git grep Rcpp

  1. Remove Rcpp references from Imports and LinkingTo
  2. Remove the importFrom Rcpp roxygen annotation in R/readxl.R (and run devtools::document() afterwards)
  3. Remove any stray #include "Rcpp.h" directives

You may find some compilation errors, as Rcpp includes a number of standard headers which cpp11 may not. If any of these (like std::vector or std::map etc.) are used in the generated cpp11.cpp code you will need to include them in a special file, src/readxl_types.h which cpp11 will automatically include in the generated file.

@jimhester
Copy link
Contributor

It looks like the only test still failing is on windows release due to std::assert not being declared, you should be able to #include <cassert> before the file including the libxls headers to fix this.

R/read_excel.R Outdated Show resolved Hide resolved
jimhester and others added 4 commits August 5, 2021 16:32
I think the Rcpp::stop crash is happening because unwind protect does a
long jump, and long jumping from a constructor is AFAIK undefined
behavior.

I am not clear about the cpp11::warning case, but possibly it could be a similar
issue.
R/read_excel.R Outdated Show resolved Hide resolved
sbearrows and others added 3 commits August 6, 2021 12:02
cpp11 automatically translates strings to UTF-8 when you construct a std::string. Rcpp did not do this, and the existing tests assumed they would be converted in the native encoding.

so we do this explicitly using Rf_translateChar()
@jimhester jimhester merged commit 5682328 into tidyverse:master Aug 6, 2021
@jimhester
Copy link
Contributor

Thanks a million!

@jennybc
Copy link
Member

jennybc commented Aug 6, 2021

Woohoo! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants