Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a handler to improve YAML error message #2294

Merged
merged 20 commits into from
Oct 24, 2023

Conversation

pedropark99
Copy link
Contributor

@pedropark99 pedropark99 commented Sep 28, 2023

Hello! This PR seeks to fulfill the objective proposed at the issue #2226

This PR adds a tryCatch() block to the step of parsing the YAML with the chunk options, and uses the error handler to build a better error message, as proposed in the issue #2226 . The new error message is in the following format:

> knit("C:\\Users\\pedro\\Desktop\\test.Rmd")
Failed to parse YAML: 
Scanner error: mapping values are not allowed in this context at line 2, column 12


label: test multiqueries -5
 connection: db
             ^~~~~~

This error message was produced while knitting the following Rmd:

---
title: "Example"
format: pdf
engine: knitr
editor: visual
---

```{r}
# install.packages(c("DBI", "RSQLite"))
db = DBI::dbConnect(RSQLite::SQLite(), ":memory:")
```


```{sql}
#| label: test multiqueries -5
#|  connection: db

select 1
```

However, this current format of the error message is still not complete. In other words, it only partially meets the full description given by @yihui in the mentioned issue. According to the full description given by @yihui , there are two groups of indeces (or indexes) that would help to improve the error message, which are:

  • The line and column indexes that point to the exact position inside the YAML where the error occured;
  • The line index (or label) of the code chunk that contains the YAML where the error occured;

The current format of the error message that I showed above only contains the first group of indexes (the line and column indexes inside the YAML). So I can still improve this version by including the line index (or label) of the code chunk. According to @yihui we could get the code chunk index by using a groups variable. I did quickly looked for a similar variable at the parent environment from the tryCatch() block, with ls(envir = parent.env(environment())), but I did not found this variable, or, a similar global variable from which I could extract the code chunk index.

But I am more than happy to implement this second index if you can give more instructions on how to get this code chunk index/label from inside the tryCatch() block 😉.

Furthermore, since the knitr codebase tries to stick with base R as much as possible I ended up using base::regexpr() to extract/parse the column and line index from the original error message. But I fell that the parsing is still a bit clunk, so, I think that this part of the code can be improved.

Another thing that we can improve/change is how to print/return the improved error message. Since is an error message, I did try to print the message with a stop() call, however, the message was not being printed correctly. That is why, I ended up chosing to just returning the error message at the end of the error handler.

@cderv cderv requested a review from yihui September 28, 2023 16:10
Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

According to @yihui we could get the code chunk index by using a groups variable. I did quickly looked for a similar variable at the parent environment from the tryCatch() block, with ls(envir = parent.env(environment())), but I did not found this variable, or, a similar global variable from which I could extract the code chunk index.

I meant to change

lapply(groups, function(g) {
to

  lapply(seq_along(groups), function(i) {
    g = groups[[i]]

Then you got the index i. Pass it to

parse_block(g[-1], g[1], params.src, markdown_mode)

Of course you need to add a new argument to parse_block(). Then further pass i to

parts = partition_chunk(engine, code)

After you get the index in partition_chunk(), you can call current_lines(i) to get the line numbers.

If that's not clear enough, feel free to leave this task behind. I can tweak the PR by myself. Thanks!

R/parser.R Outdated Show resolved Hide resolved
R/parser.R Outdated Show resolved Hide resolved
R/parser.R Outdated Show resolved Hide resolved
R/parser.R Outdated Show resolved Hide resolved
@yihui yihui self-assigned this Oct 11, 2023
@pedropark99
Copy link
Contributor Author

pedropark99 commented Oct 19, 2023

Hey @yihui ! First of all, thank you for the review!

OBS: There is one task in the CI pipeline that is failling because of an outdated R version which is not supported by a dependency. Because I understand that the fix to this error is out of this PR scope, I will leave this error to you 😉.

I made the changes you requested in your review.

After that, I did a quick "debug walking" through the parser code, to get a better understanding on how it works. Then, I followed your instructions on how to get the chunk lines. With my latest changes, the error message is now on the following format:

> knit("C:\\Users\\pedro\\Desktop\\test.Rmd")
In partition_chunk(engine, code, block_index) :
  Invalid YAML option format in chunk: 
Failed to parse YAML inside code chunk at lines 15-19. Scanner error: mapping values are not allowed in this context at line 2, column 7


connection: db
 label: test multiqueries -5
       ^~~~~~

I think that maybe the above message can still be improved if we change the last bit of the message like this:

Failed to parse YAML inside code chunk at lines 15-19. Mapping values are not allowed in a code chunk context. Error found at line 2, column 7 inside the code chunk.

connection: db
 label: test multiqueries -5
       ^~~~~~

What is your opinion on it? If you think that this is a good addition, I can make the changes for us.

Now, in addition to parser.R, since I basically changed the function signatures for partition_chunk() and parse_block(), I also had to make changes in two more places:

  • change all the calls to parse_block() inside the unit tests to include the new block index parameter.
  • change the partition_chunk() documentation to include the new block index parameter.

The changes above led me to also recompile all functions documentation with devtools::document().

Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll finish the rest of work by myself.

I think that maybe the above message can still be improved if we change the last bit of the message like this:

I agree, but I feel that we are hacking the message from the yaml package too much. Let's leave it unchanged for now, even though it's a little obscure (I think the ^~~~~ mark can offer enough guidance).

@yihui yihui linked an issue Oct 24, 2023 that may be closed by this pull request
@yihui yihui merged commit 8725408 into yihui:master Oct 24, 2023
10 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve YAML error thrown when parsing in-chunk options
2 participants