Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert readxl from Rcpp to cpp11 #659

Merged
merged 32 commits into from
Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
40d1300
init convert to cpp11
Jul 28, 2021
2a20277
check readxl fx in zip
Jul 29, 2021
0faa29a
add Jims changes
Jul 29, 2021
aa0523a
Merge branch 'master' into use_cpp11
sbearrows Jul 29, 2021
ca4ca54
Merge commit 'd7b71b0a0c63d2450d26f35e986748526a523feb'
Jul 29, 2021
ec7c62c
convert StringSet.h and define macros for cpp11
Jul 30, 2021
7480b9a
Merge commit 'aa0523ad54e80b3cea4c6426883dee27588b1bda'
Jul 30, 2021
1278c16
converted utils,h
Jul 30, 2021
0e48fb4
Remove Rcpp exports
jimhester Jul 30, 2021
8299cd7
Reemove RcppExports R file as well
jimhester Jul 30, 2021
4e6aab5
Include cpp11.cpp
jimhester Jul 30, 2021
8574e0c
convert CellLimits and friends and update tests
Aug 2, 2021
3f7553f
Jims edits
Aug 2, 2021
0470661
fixed crash
Aug 3, 2021
79fc036
remove changes to limits
Aug 3, 2021
5f07e14
fix strlen error
Aug 3, 2021
988bb94
Use R_xlen_t literal syntax
jimhester Aug 3, 2021
51a5957
converting more files
Aug 4, 2021
33d6136
convert WorkBook and friends
Aug 4, 2021
d6bf9a3
use r_string for sexp
Aug 4, 2021
a50c859
Include visibility in the exported init function
jimhester Aug 4, 2021
7c8ad23
fix current errors with Jim
Aug 4, 2021
8bc4c5c
converted ColSpec
Aug 4, 2021
deecf96
Merge commit 'a50c859972ccb2725534d9c89c6ae66cd4cb1d78'
Aug 4, 2021
9215609
reorg header files and change limits to ints
Aug 5, 2021
24a8970
Fix crashes
jimhester Aug 5, 2021
51e82a9
remove all refs to Rcpp
Aug 5, 2021
db8fc06
remove excess writable vars
Aug 5, 2021
1f20ad9
Use a dev branch of cpp11
jennybc Aug 6, 2021
2842599
clean up and reorg headers
Aug 6, 2021
33c79a5
Tweaks to get filenames to work in read_xls_
jimhester Aug 6, 2021
5aaf592
Regenerate registrations
jimhester Aug 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Suggests:
rprojroot (>= 1.1),
testthat (>= 2.0.0)
LinkingTo:
cpp11,
progress,
Rcpp
VignetteBuilder:
Expand All @@ -65,3 +66,4 @@ Encoding: UTF-8
Note: libxls v1.6.2 4482400
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1.9001
SystemRequirements: C++11
37 changes: 37 additions & 0 deletions R/cpp11.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Generated by cpp11: do not edit by hand

xls_sheets <- function(path) {
.Call(`_readxl_xls_sheets`, path)
}

xls_date_formats <- function(path) {
.Call(`_readxl_xls_date_formats`, path)
}

read_xls_ <- function(path, sheet_i, limits, shim, col_names, col_types, na, trim_ws, guess_max, progress) {
.Call(`_readxl_read_xls_`, path, sheet_i, limits, shim, col_names, col_types, na, trim_ws, guess_max, progress)
}

xlsx_sheets <- function(path) {
.Call(`_readxl_xlsx_sheets`, path)
}

xlsx_strings <- function(path) {
.Call(`_readxl_xlsx_strings`, path)
}

xlsx_date_formats <- function(path) {
.Call(`_readxl_xlsx_date_formats`, path)
}

parse_ref <- function(ref) {
.Call(`_readxl_parse_ref`, ref)
}

read_xlsx_ <- function(path, sheet_i, limits, shim, col_names, col_types, na, trim_ws, guess_max, progress) {
.Call(`_readxl_read_xlsx_`, path, sheet_i, limits, shim, col_names, col_types, na, trim_ws, guess_max, progress)
}

zip_xml <- function(zip_path, file_path) {
invisible(.Call(`_readxl_zip_xml`, zip_path, file_path))
}
3 changes: 3 additions & 0 deletions src/CellLimits.h
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#define R_NO_REMAP
#define STRICT_R_HEADERS
jennybc marked this conversation as resolved.
Show resolved Hide resolved

#ifndef READXL_CELLLIMITS_
#define READXL_CELLLIMITS_

Expand Down
5 changes: 4 additions & 1 deletion src/ColSpec.h
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
#define R_NO_REMAP
#define STRICT_R_HEADERS

#ifndef READXL_COLSPEC_
#define READXL_COLSPEC_

#include "StringSet.h"
#include <Rcpp.h>
jennybc marked this conversation as resolved.
Show resolved Hide resolved
#include "libxls/xls.h"
#include "StringSet.h"

enum CellType {
CELL_UNKNOWN,
Expand Down
19 changes: 12 additions & 7 deletions src/StringSet.h
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
#ifndef READXL_STRINGSET_
#define READXL_STRINGSET_
#define R_NO_REMAP
#define STRICT_R_HEADERS

#include <Rcpp.h>
#include <cpp11/strings.hpp>
#include "utils.h"
#include <set>
#include <vector>
Comment on lines +6 to +7
Copy link
Member

@jennybc jennybc Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to keep the system includes, e.g. <whatever> together and above the ones defined here, e.g. "utils.h"?

If you agree, that would be a good thing to check globally. I think that is (mostly?) how things started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for reminding me about this.
I rearranged some of the header files to avoid conflicts with Rcpp and cpp11 libraries so I need to reorganize them (or organize them back to how you had them). I ended up using the macros #define R_NO_REMAP and #define STRICT_R_HEADERS instead to avoid the conflicts, which I'll remove since they won't be needed anymore once this PR is closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state seems to be much more regular, but often puts the local headers before the system headers. It feels like a more conventional order would be: system, vendored code (libxls), written-by-us, which also reflects intended precedence. (Let me also acknowledge that this probably doesn't matter and is more a matter of style at this point.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry! I should have reread your comment before executing based on my memory. I'll rearrange in that order.


#ifndef READXL_STRINGSET_
#define READXL_STRINGSET_
jennybc marked this conversation as resolved.
Show resolved Hide resolved

class StringSet
{
Expand All @@ -13,12 +18,12 @@ class StringSet
set_.insert(s);
}
StringSet(const std::vector<std::string> &s) {
for (std::vector<std::string>::const_iterator i = s.begin(); i != s.end(); ++i)
for (auto i = s.begin(); i != s.end(); ++i)
set_.insert(*i);
}
StringSet(const Rcpp::CharacterVector &s) {
for (Rcpp::CharacterVector::const_iterator i = s.begin(); i != s.end(); ++i)
set_.insert(Rcpp::as<std::string>(*i));
StringSet(const cpp11::strings &s) {
for (auto i = s.begin(); i != s.end(); ++i)
set_.insert(cpp11::as_cpp<std::string>(*i));
}
bool contains(const std::string &s) const {
return set_.find(s) != set_.end();
Expand Down
8 changes: 5 additions & 3 deletions src/XlsWorkBook.cpp
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
#include <Rcpp.h>
#define R_NO_REMAP
#define STRICT_R_HEADERS

#include "XlsWorkBook.h"
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
[[cpp11::register]]
CharacterVector xls_sheets(std::string path) {
XlsWorkBook wb(path);
return wb.sheets();
}

// [[Rcpp::export]]
[[cpp11::register]]
std::set<int> xls_date_formats(std::string path) {
return XlsWorkBook(path).dateFormats();
}
5 changes: 4 additions & 1 deletion src/XlsWorkBook.h
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
#define R_NO_REMAP
#define STRICT_R_HEADERS

#ifndef READXL_XLSWORKBOOK_
#define READXL_XLSWORKBOOK_

#include "ColSpec.h"
#include <Rcpp.h>
#include "libxls/xls.h"
#include "ColSpec.h"
#include "utils.h"

class XlsWorkBook {
Expand Down
2 changes: 1 addition & 1 deletion src/XlsWorkSheet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#include "libxls/xls.h"
using namespace Rcpp;

// [[Rcpp::export]]
[[cpp11::register]]
List read_xls_(std::string path, int sheet_i,
IntegerVector limits, bool shim,
RObject col_names, RObject col_types,
Expand Down
6 changes: 3 additions & 3 deletions src/XlsxWorkBook.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
#include "XlsxWorkBook.h"
using namespace Rcpp;

// [[Rcpp::export]]
[[cpp11::register]]
CharacterVector xlsx_sheets(std::string path) {
return XlsxWorkBook(path).sheets();
}

// [[Rcpp::export]]
[[cpp11::register]]
std::vector<std::string> xlsx_strings(std::string path) {
return XlsxWorkBook(path).stringTable();
}

// [[Rcpp::export]]
[[cpp11::register]]
std::set<int> xlsx_date_formats(std::string path) {
return XlsxWorkBook(path).dateFormats();
}
10 changes: 7 additions & 3 deletions src/XlsxWorkBook.h
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
#ifndef READXL_XLSXWORKBOOK_
#define READXL_XLSXWORKBOOK_

#include <Rcpp.h>
#include "rapidxml.h"
#include "ColSpec.h"
#include "XlsxString.h"
#include "utils.h"
#include "zip.h"
#include <Rcpp.h>

#ifndef READXL_XLSXWORKBOOK_
#define READXL_XLSXWORKBOOK_

#define R_NO_REMAP
#define STRICT_R_HEADERS

class XlsxWorkBook {

Expand Down
4 changes: 2 additions & 2 deletions src/XlsxWorkSheet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
#include "utils.h"
using namespace Rcpp;

// [[Rcpp::export]]
[[cpp11::register]]
IntegerVector parse_ref(std::string ref) {
std::pair<int,int> parsed = parseRef(ref.c_str());

return IntegerVector::create(parsed.first, parsed.second);
}

// [[Rcpp::export]]
[[cpp11::register]]
List read_xlsx_(std::string path, int sheet_i,
IntegerVector limits, bool shim,
RObject col_names, RObject col_types,
Expand Down
93 changes: 93 additions & 0 deletions src/cpp11.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Generated by cpp11: do not edit by hand
// clang-format off

#include <cpp11/R.hpp>
#include <Rcpp.h>
using namespace Rcpp;
#include "cpp11/declarations.hpp"

// XlsWorkBook.cpp
CharacterVector xls_sheets(std::string path);
extern "C" SEXP _readxl_xls_sheets(SEXP path) {
BEGIN_CPP11
return cpp11::as_sexp(xls_sheets(cpp11::as_cpp<cpp11::decay_t<std::string>>(path)));
END_CPP11
}
// XlsWorkBook.cpp
std::set<int> xls_date_formats(std::string path);
extern "C" SEXP _readxl_xls_date_formats(SEXP path) {
BEGIN_CPP11
return cpp11::as_sexp(xls_date_formats(cpp11::as_cpp<cpp11::decay_t<std::string>>(path)));
END_CPP11
}
// XlsWorkSheet.cpp
List read_xls_(std::string path, int sheet_i, IntegerVector limits, bool shim, RObject col_names, RObject col_types, std::vector<std::string> na, bool trim_ws, int guess_max, bool progress);
extern "C" SEXP _readxl_read_xls_(SEXP path, SEXP sheet_i, SEXP limits, SEXP shim, SEXP col_names, SEXP col_types, SEXP na, SEXP trim_ws, SEXP guess_max, SEXP progress) {
BEGIN_CPP11
return cpp11::as_sexp(read_xls_(cpp11::as_cpp<cpp11::decay_t<std::string>>(path), cpp11::as_cpp<cpp11::decay_t<int>>(sheet_i), cpp11::as_cpp<cpp11::decay_t<IntegerVector>>(limits), cpp11::as_cpp<cpp11::decay_t<bool>>(shim), cpp11::as_cpp<cpp11::decay_t<RObject>>(col_names), cpp11::as_cpp<cpp11::decay_t<RObject>>(col_types), cpp11::as_cpp<cpp11::decay_t<std::vector<std::string>>>(na), cpp11::as_cpp<cpp11::decay_t<bool>>(trim_ws), cpp11::as_cpp<cpp11::decay_t<int>>(guess_max), cpp11::as_cpp<cpp11::decay_t<bool>>(progress)));
END_CPP11
}
// XlsxWorkBook.cpp
CharacterVector xlsx_sheets(std::string path);
extern "C" SEXP _readxl_xlsx_sheets(SEXP path) {
BEGIN_CPP11
return cpp11::as_sexp(xlsx_sheets(cpp11::as_cpp<cpp11::decay_t<std::string>>(path)));
END_CPP11
}
// XlsxWorkBook.cpp
std::vector<std::string> xlsx_strings(std::string path);
extern "C" SEXP _readxl_xlsx_strings(SEXP path) {
BEGIN_CPP11
return cpp11::as_sexp(xlsx_strings(cpp11::as_cpp<cpp11::decay_t<std::string>>(path)));
END_CPP11
}
// XlsxWorkBook.cpp
std::set<int> xlsx_date_formats(std::string path);
extern "C" SEXP _readxl_xlsx_date_formats(SEXP path) {
BEGIN_CPP11
return cpp11::as_sexp(xlsx_date_formats(cpp11::as_cpp<cpp11::decay_t<std::string>>(path)));
END_CPP11
}
// XlsxWorkSheet.cpp
IntegerVector parse_ref(std::string ref);
extern "C" SEXP _readxl_parse_ref(SEXP ref) {
BEGIN_CPP11
return cpp11::as_sexp(parse_ref(cpp11::as_cpp<cpp11::decay_t<std::string>>(ref)));
END_CPP11
}
// XlsxWorkSheet.cpp
List read_xlsx_(std::string path, int sheet_i, IntegerVector limits, bool shim, RObject col_names, RObject col_types, std::vector<std::string> na, bool trim_ws, int guess_max, bool progress);
extern "C" SEXP _readxl_read_xlsx_(SEXP path, SEXP sheet_i, SEXP limits, SEXP shim, SEXP col_names, SEXP col_types, SEXP na, SEXP trim_ws, SEXP guess_max, SEXP progress) {
BEGIN_CPP11
return cpp11::as_sexp(read_xlsx_(cpp11::as_cpp<cpp11::decay_t<std::string>>(path), cpp11::as_cpp<cpp11::decay_t<int>>(sheet_i), cpp11::as_cpp<cpp11::decay_t<IntegerVector>>(limits), cpp11::as_cpp<cpp11::decay_t<bool>>(shim), cpp11::as_cpp<cpp11::decay_t<RObject>>(col_names), cpp11::as_cpp<cpp11::decay_t<RObject>>(col_types), cpp11::as_cpp<cpp11::decay_t<std::vector<std::string>>>(na), cpp11::as_cpp<cpp11::decay_t<bool>>(trim_ws), cpp11::as_cpp<cpp11::decay_t<int>>(guess_max), cpp11::as_cpp<cpp11::decay_t<bool>>(progress)));
END_CPP11
}
// zip.cpp
void zip_xml(const std::string& zip_path, const std::string& file_path);
extern "C" SEXP _readxl_zip_xml(SEXP zip_path, SEXP file_path) {
BEGIN_CPP11
zip_xml(cpp11::as_cpp<cpp11::decay_t<const std::string&>>(zip_path), cpp11::as_cpp<cpp11::decay_t<const std::string&>>(file_path));
return R_NilValue;
END_CPP11
}

extern "C" {
static const R_CallMethodDef CallEntries[] = {
{"_readxl_parse_ref", (DL_FUNC) &_readxl_parse_ref, 1},
{"_readxl_read_xls_", (DL_FUNC) &_readxl_read_xls_, 10},
{"_readxl_read_xlsx_", (DL_FUNC) &_readxl_read_xlsx_, 10},
{"_readxl_xls_date_formats", (DL_FUNC) &_readxl_xls_date_formats, 1},
{"_readxl_xls_sheets", (DL_FUNC) &_readxl_xls_sheets, 1},
{"_readxl_xlsx_date_formats", (DL_FUNC) &_readxl_xlsx_date_formats, 1},
{"_readxl_xlsx_sheets", (DL_FUNC) &_readxl_xlsx_sheets, 1},
{"_readxl_xlsx_strings", (DL_FUNC) &_readxl_xlsx_strings, 1},
{"_readxl_zip_xml", (DL_FUNC) &_readxl_zip_xml, 2},
{NULL, NULL, 0}
};
}

extern "C" void R_init_readxl(DllInfo* dll){
R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
R_useDynamicSymbols(dll, FALSE);
R_forceSymbols(dll, TRUE);
}
12 changes: 9 additions & 3 deletions src/utils.h
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
#define R_NO_REMAP
#define STRICT_R_HEADERS

#ifndef UTILS_
#define UTILS_

#include <cpp11/protect.hpp>
#include <sstream>
#include <cerrno>
#include "StringSet.h"



// The date offset needed to align Excel dates with R's use of 1970-01-01
// depends on the "date system".
Expand Down Expand Up @@ -64,7 +70,7 @@ inline double POSIXctFromSerial(double xlDate, bool is1904) {
xlDate = (xlDate < 60) ? xlDate + 1 : -1;
}
if (xlDate < 0) {
Rcpp::warning("NA inserted for impossible 1900-02-29 datetime");
cpp11::warning("NA inserted for impossible 1900-02-29 datetime");
return NA_REAL;
} else {
return dateRound((xlDate - dateOffset(is1904)) * 86400);
Expand All @@ -81,7 +87,7 @@ inline std::pair<int, int> parseRef(const char* ref) {
} else if (*cur >= 'A' && *cur <= 'Z') {
col = 26 * col + (*cur - 'A' + 1);
} else {
Rcpp::stop("Invalid character '%s' in cell ref '%s'", *cur, ref);
cpp11::stop("Invalid character '%s' in cell ref '%s'", *cur, ref);
}
}

Expand Down
27 changes: 10 additions & 17 deletions src/zip.cpp
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
#include <Rcpp.h>

#include <cpp11/function.hpp>
#include <cpp11/raws.hpp>
#include <cpp11/logicals.hpp>
#include <cpp11/as.hpp>
#include "zip.h"
#include "rapidxml_print.h"

using namespace Rcpp;

Function readxl(const std::string& fun){
Environment env = Environment::namespace_env("readxl");
return env[fun];
}

std::string zip_buffer(const std::string& zip_path,
const std::string& file_path) {
Rcpp::Function zip_buffer = readxl("zip_buffer");
cpp11::function zip_buffer = cpp11::package("readxl")["zip_buffer"];

Rcpp::RawVector xml = Rcpp::as<Rcpp::RawVector>(zip_buffer(zip_path, file_path));
cpp11::raws xml(zip_buffer(zip_path, file_path));
std::string buffer(RAW(xml), RAW(xml) + xml.size());
buffer.push_back('\0');

Expand All @@ -23,10 +18,8 @@ std::string zip_buffer(const std::string& zip_path,

bool zip_has_file(const std::string& zip_path,
const std::string& file_path) {
Rcpp::Function zip_has_file = readxl("zip_has_file");

LogicalVector res = wrap<LogicalVector>(zip_has_file(zip_path, file_path));
return res[0];
cpp11::function zip_has_file = cpp11::package("readxl")["zip_has_file"];
return zip_has_file(zip_path, file_path);
}

std::string xml_print(std::string xml) {
Expand All @@ -41,10 +34,10 @@ std::string xml_print(std::string xml) {
return s;
}

// [[Rcpp::export]]
[[cpp11::register]]
void zip_xml(const std::string& zip_path,
const std::string& file_path) {

std::string buffer = zip_buffer(zip_path, file_path);
Rcout << xml_print(buffer);
Rprintf("%s", xml_print(buffer).c_str());
}
Loading