Skip to content

Commit

Permalink
ARROW-12017: [R] [Documentation] Make proper developing arrow docs
Browse files Browse the repository at this point in the history
Closes apache#9898 from jonkeane/ARROW-12017-dev-docs

Lead-authored-by: Jonathan Keane <jkeane@gmail.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
  • Loading branch information
jonkeane and nealrichardson committed Apr 15, 2021
1 parent 818c57c commit 1c0641d
Show file tree
Hide file tree
Showing 10 changed files with 724 additions and 58 deletions.
92 changes: 92 additions & 0 deletions dev/tasks/r/github.devdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# NOTE: must set "Crossbow" as name to have the badge links working in the
# github comment reports!
name: Crossbow

on:
push:
branches:
- "*-github-*"

jobs:
devdocs:
name: 'R devdocs {{ "${{ matrix.os }}" }} system install: {{ "${{ matrix.system-install }}" }}'
runs-on: {{ "${{ matrix.os }}" }}
strategy:
fail-fast: false
matrix:
os: [macOS-latest, ubuntu-20.04]
# should the install method install libarrow into a system directory
# or a temporary directory. old is the same as a temporary
# directory, but an old version of libarrow will be installed
# into a system directory first (to make sure we can link correctly when building)
system-install: [true, false]

steps:
- name: Checkout Arrow
run: |
git clone --no-checkout {{ arrow.remote }} arrow
git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
git -C arrow checkout FETCH_HEAD
git -C arrow submodule update --init --recursive
- uses: r-lib/actions/setup-r@v1
- uses: r-lib/actions/setup-pandoc@v1
- name: Install knitr, rmarkdown
run: |
install.packages(c("rmarkdown", "knitr", "sessioninfo"))
shell: Rscript {0}
- name: Session info
run: |
options(width = 100)
pkgs <- installed.packages()[, "Package"]
sessioninfo::session_info(pkgs, include_base = TRUE)
shell: Rscript {0}
- name: Write the install script
env:
RUN_DEVDOCS: TRUE
DEVDOCS_MACOS: {{ "${{contains(matrix.os, 'macOS')}}" }}
DEVDOCS_UBUNTU: {{ "${{contains(matrix.os, 'ubuntu')}}" }}
DEVDOCS_SYSTEM_INSTALL: {{ "${{contains(matrix.system-install, 'true')}}" }}
DEVDOCS_PRIOR_SYSTEM_INSTALL: {{ "${{contains(matrix.system-install, 'old')}}" }}
run: |
# This isn't actually rendering the docs, but will save arrow/r/vignettes/script.sh
# which can be sourced to install arrow.
rmarkdown::render("arrow/r/vignettes/developing.Rmd")
shell: Rscript {0}
- name: Install from the devdocs
env:
LIBARROW_BINARY: FALSE
ARROW_R_DEV: TRUE
run: bash arrow/r/vignettes/script.sh
shell: bash
- name: Ensure that the Arrow package is loadable and we have the correct one
run: |
echo $LD_LIBRARY_PATH
R --no-save <<EOF
Sys.getenv("LD_LIBRARY_PATH")
library(arrow)
arrow_info()
EOF
shell: bash -l {0}
- name: Save the install script
uses: actions/upload-artifact@v2
with:
name: {{ "devdocs-script_os-${{ matrix.os }}_sysinstall-${{ matrix.system-install }}" }}
path: arrow/r/vignettes/script.sh
if: always()
4 changes: 4 additions & 0 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1389,6 +1389,10 @@ tasks:
ci: github
template: r/github.macos-linux.local.yml

test-r-devdocs:
ci: github
template: r/github.devdocs.yml

test-r-rhub-ubuntu-gcc-release:
ci: azure
template: r/azure.linux.yml
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1016,7 +1016,7 @@ services:
shm_size: *shm-size
environment:
LIBARROW_DOWNLOAD: "false"
ARROW_HOME: "/arrow"
ARROW_SOURCE_HOME: "/arrow"
ARROW_R_DEV: ${ARROW_R_DEV}
# To test for CRAN release, delete ^^ these two env vars so we download the Apache release
ARROW_USE_PKG_CONFIG: "false"
Expand Down
5 changes: 4 additions & 1 deletion r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@ Over 100 functions can now be called on Arrow objects inside a `dplyr` verb:
* Similarly, `Schema` can now be edited by assigning in new types. This enables using the CSV reader to detect the schema of a file, modify the `Schema` object for any columns that you want to read in as a different type, and then use that `Schema` to read the data.
* Better validation when creating a `Table` with a schema, with columns of different lengths, and with scalar value recycling
* Reading Parquet files in Japanese or other multi-byte locales on Windows no longer hangs (workaround for a [bug in libstdc++](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723); thanks @yutannihilation for the persistence in discovering this!)
* If you attempt to read string data that has embedded nul (`\0`) characters, the error message now informs you that you can set `options(arrow.skip_nul = TRUE)` to strip them out. It is not recommended to set this option by default since this code path is sigificantly slower, and most string data does not contain nuls.
* If you attempt to read string data that has embedded nul (`\0`) characters, the error message now informs you that you can set `options(arrow.skip_nul = TRUE)` to strip them out. It is not recommended to set this option by default since this code path is significantly slower, and most string data does not contain nuls.
* `read_json_arrow()` now accepts a schema: `read_json_arrow("file.json", schema = schema(col_a = float64(), col_b = string()))`

## Installation and configuration

Expand All @@ -64,6 +65,8 @@ Over 100 functions can now be called on Arrow objects inside a `dplyr` verb:
* Setting the `ARROW_DEFAULT_MEMORY_POOL` environment variable to switch memory allocators now works correctly when the Arrow C++ library has been statically linked (as is usually the case when installing from CRAN).
* The `arrow_info()` function now reports on the additional optional features, as well as the detected SIMD level. If key features or compression libraries are not enabled in the build, `arrow_info()` will refer to the installation vignette for guidance on how to install a more complete build, if desired.
* If you attempt to read a file that was compressed with a codec that your Arrow build does not contain support for, the error message now will tell you how to reinstall Arrow with that feature enabled.
* A new vignette about developer environment setup `vignette("developing", package = "arrow")`.
* When building from source, you can use the environment variable `ARROW_HOME` to point to a specific directory where the Arrow libraries are. This is similar to passing `INCLUDE_DIR` and `LIB_DIR`.

# arrow 3.0.0

Expand Down
15 changes: 15 additions & 0 deletions r/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,21 @@ navbar:
href: https://arrow.apache.org/docs/python
- text: R
href: index.html
articles:
text: Articles
menu:
- text: Installing the Arrow Package on Linux
href: articles/install.html
- text: Working with Arrow Datasets and dplyr
href: articles/dataset.html
- text: Working with Cloud Storage (S3)
href: articles/fs.html
- text: Apache Arrow in Python and R with reticulate
href: articles/python.html
- text: Connecting to Flight RPC Servers
href: articles/flight.html
- text: Arrow R Developer Guide
href: articles/developing.html
reference:
- title: Multi-file datasets
contents:
Expand Down
10 changes: 7 additions & 3 deletions r/configure
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,12 @@ if [ "$FORCE_AUTOBREW" = "true" ] || [ "$FORCE_BUNDLED_BUILD" = "true" ]; then
fi

# Note that cflags may be empty in case of success
if [ "$INCLUDE_DIR" ] || [ "$LIB_DIR" ]; then
echo "*** Using INCLUDE_DIR/LIB_DIR"
if [ "$ARROW_HOME" ]; then
echo "*** Using ARROW_HOME as the source of libarrow"
PKG_CFLAGS="-I$ARROW_HOME/include $PKG_CFLAGS"
PKG_DIRS="-L$ARROW_HOME/lib"
elif [ "$INCLUDE_DIR" ] && [ "$LIB_DIR" ]; then
echo "*** Using INCLUDE_DIR/LIB_DIR as the source of libarrow"
PKG_CFLAGS="-I$INCLUDE_DIR $PKG_CFLAGS"
PKG_DIRS="-L$LIB_DIR"
else
Expand All @@ -80,7 +84,7 @@ else
# TODO: what about --libs-only-other?
fi

if [ "$PKGCONFIG_CFLAGS" ] || [ "$PKGCONFIG_LIBS" ]; then
if [ "$PKGCONFIG_CFLAGS" ] && [ "$PKGCONFIG_LIBS" ]; then
echo "*** Arrow C++ libraries found via pkg-config"
PKG_CFLAGS="$PKGCONFIG_CFLAGS"
PKG_LIBS=${PKGCONFIG_LIBS}
Expand Down
65 changes: 65 additions & 0 deletions r/pkgdown/extra.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

(function () {
// Load the rmarkdown tabset script
var script = document.createElement("script");
script.type = "text/javascript";
script.async = true;
script.src =
"https://cdn.jsdelivr.net/gh/rstudio/rmarkdown@47d837d3d9cd5e8e212b05767454f058db7d2789/inst/rmd/h/navigation-1.1/tabsets.js";
script.integrity = "sha256-Rs54TE1FCN1uLM4f7VQEMiRTl1Ia7TiQLkMruItwV+Q=";
script.crossOrigin = "anonymous";

// Run the processing as the onload callback
script.onload = () => {
// Monkey patch the .html method to use the .text method
$(document).ready(function () {
(function ($) {
$.fn.html = function (content) {
return this.text();
};
})(jQuery);

window.buildTabsets("toc");
});

$(document).ready(function () {
$(".tabset-dropdown > .nav-tabs > li").click(function () {
$(this).parent().toggleClass("nav-tabs-open");
});
});

$(document).ready(function () {
/**
* The tabset creation above sometimes relies on empty headers to stop the
* tabbing. Though they shouldn't be included in the TOC in the first place,
* this will remove empty headers from the TOC after it's created.
*/

// find all the empty <a> elements and remove them (and their parents)
var empty_a = $("#toc").find("a").filter(":empty");
empty_a.parent().remove();

// now find any empty <ul>s and remove them too
var empty_ul = $("#toc").find("ul").filter(":empty");
empty_ul.remove();
});
};

document.head.appendChild(script);
})();
2 changes: 1 addition & 1 deletion r/tools/nixlibs.R
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ apache_download <- function(destfile, n_mirrors = 3) {
downloaded
}

find_local_source <- function(arrow_home = Sys.getenv("ARROW_HOME", "..")) {
find_local_source <- function(arrow_home = Sys.getenv("ARROW_SOURCE_HOME", "..")) {
if (file.exists(paste0(arrow_home, "/cpp/src/arrow/api.h"))) {
# We're in a git checkout of arrow, so we can build it
cat("*** Found local C++ source\n")
Expand Down
Loading

0 comments on commit 1c0641d

Please sign in to comment.