Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knitr::purl does not retrieve comments that starts with #| in chunks #2268

Closed
LuisLauM opened this issue Jul 10, 2023 · 10 comments
Closed

knitr::purl does not retrieve comments that starts with #| in chunks #2268

LuisLauM opened this issue Jul 10, 2023 · 10 comments
Assignees

Comments

@LuisLauM
Copy link

LuisLauM commented Jul 10, 2023

R 4.3.1
knitr 1.43
OS: Windows 10 64bits

Hello.

I think there is a bug with knitr::purl function. When I was working with some Quarto files, I realize that knitr::purl excludes from its outputs those comments that starts inmediatly with #|. For example:
Input:

text1 <- "```{r, fig.cap = 'Hello world.'}

#| Hello world 1
#| Hello world 2
mtcars$cyl
```"

text2 <- "```{r, fig.cap = 'Hello world.'}
#| Hello world 1
#| Hello world 2
mtcars$cyl
```"

cat(purl(text = text1))
cat(purl(text = text2))

Clearly, I do expect that both text should produce (almost) the same result (I put almost because maybe the text1 could have an extra breakline, but not more). However, with text2, knitr:purl omits all the comment lines.


By filing an issue to this repo, I promise that

  • [OK] I have fully read the issue guide at https://yihui.org/issue/.
  • [OK] I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • [OK] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@LuisLauM LuisLauM changed the title knitr::purl does not retrieve comments that starts with #| in chunks knitr::purl does not retrieve comments that starts with #| in chunks Jul 10, 2023
@LuisLauM
Copy link
Author

I think the core of the difference is placed in the body of knitr:::partition_chunk, where I can see a different behaviour whether the text starts (or not) directly with a commment header.

@cderv
Copy link
Collaborator

cderv commented Jul 11, 2023

Hi !

Thanks for opening an issue.

First, let's gets some precision.

#| is a special comment syntax which can be used for passing option as multiline or as YAML syntax. (https://yihui.org/en/2022/01/knitr-news/)

Since knitr 1.35, we have provided an alternative way to write chunk options. That is, you can write then inside a chunk after the special comments #|

This is expected to use such comment for option with Quarto files as this is the recommended way to pass option to code cells there (https://quarto.org/docs/computations/r.html#chunk-options)

When you do this

#| Hello world 1

it is not valid chunk option. Are they intended to be regular comment ? If so, you shouldn't use #|

I think the core of the difference is placed in the body of knitr:::partition_chunk, where I can see a different behaviour whether the text starts (or not) directly with a commment header.

This is the intended differences: Starting with #| means in-chunk options to be parsed.

This leads to my context question: Are you trying to purl in the context of a Quarto document ? knitr::purl("mydoc.qmd") ? Or unrelated to Quarto and the in-chunk option syntax at all ?

Thank you

@LuisLauM
Copy link
Author

Hello.

I am trying to extract that kind of comments in order to retrieve the references in the captions of (Quarto) chunks (i.e. those ones that start with @), so I started trying to get the headers that starts with #| and then I will look for into the .cap=.

@cderv
Copy link
Collaborator

cderv commented Jul 11, 2023

Where in your process do you need to purl() from some text ?

knitr::purl() was initially design to work with regular knitr chunk, with option in the chunk header inside curly braces.

We recently brought support for purl-ing a .qmd file, but this expect only cells option inside chunk as YAML

Example with test.qmd

```{r}
#| fig-cap: Hello world
#| echo: false
#| eval: true
mtcars$cyl
```

run knitr::purl("test.qmd") and get this R file

#| fig-cap: Hello world
#| echo: false
#| eval: true
mtcars$cyl

The behavior is changing because internally knitr:::is_quarto() is TRUE.

It seems we are not supported both in-chunk options, with in-header option. We are not merging those too in our tangle = TRUE process.

Why do you have both type of options syntax ?

It would be helpful to really understand how you are working, and what you are trying to do. It seems you want to build around Quarto, and maybe using knitr directly to parse the document is not the best move.

@LuisLauM
Copy link
Author

To put it in a nutshell, what I was trying to achieve was a way to extract the captions from chunks whether they came from a Rmd or Qmd (file) text. But now I understand better how purl works and I think I will have to look for another way (with regular expressions).

Thanks for the help and time. 👍

@cderv
Copy link
Collaborator

cderv commented Jul 11, 2023

There is a really good package to parse Rmd content called parsermd (https://github.com/rundel/parsermd/)

But it is not yet adapted to support the new option format:

when it does, I guess it can be useful to your use case

@cderv
Copy link
Collaborator

cderv commented Jul 11, 2023

@yihui I am leaving this open in case you feel we should still improve our purl output. Currently we parse in-header and in-chunk options but we don't merge them before writting the tangled output. I wonder if we should improve... 🤔

@LuisLauM
Copy link
Author

Thank you. Yes, my interest is to come up with a way to extract all citations made within an Rmd or Qmd document, but with the care NOT to take into account places where a citation might not appear (e.g. within a chunk, only search in the fig.cap or tbl.cap part, but not to consider for example xlab = "@thisisisnotaref").

@yihui yihui closed this as completed in 66d7700 Aug 24, 2023
@yihui
Copy link
Owner

yihui commented Aug 24, 2023

#| comments will be preserved now. Thanks!

Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants