Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling often overwrites errors already set #740

Open
hidmic opened this issue Aug 11, 2020 · 12 comments
Open

Error handling often overwrites errors already set #740

hidmic opened this issue Aug 11, 2020 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@hidmic
Copy link
Contributor

hidmic commented Aug 11, 2020

Bug report

Required Info:

  • Operating System:
    • All Tier 1 platforms
  • Installation type:
    • From source
  • Version or commit hash:

Steps to reproduce issue

Run rcl tests locally:

colcon test --event-handlers=console_direct+ --packages-select rcl

Expected behavior

All tests pass and no error handling error (an overwrite) is reported via stderr.

Actual behavior

All tests pass but plenty error handling errors (overwrites) are reported.

Additional information

These overwrites are the result of either ignoring errors set by rcutils or rmw APIs or attempting to propagate them, which isn't necessarily bad practice if it weren't for the fact that all our libraries share the same global error state.

@hidmic hidmic added the bug Something isn't working label Aug 11, 2020
@hidmic
Copy link
Contributor Author

hidmic commented Aug 11, 2020

IMO our main issue here is that there's no clear path forward. Even if we decide to rely on all libraries sharing the same error handling state, rcutils error handling API was not designed to ease error propagation, and suppressing portions of an error chain either decontextualizes an error message or obscures its origin e.g. failed to duplicate string and publisher failed to initialize are equally useless error messages.

@hidmic
Copy link
Contributor Author

hidmic commented Aug 11, 2020

FYI @Blast545 @brawner.

@Blast545
Copy link
Contributor

For future reference, discussion started here: #734 (comment)

I am not really sure what's the best way to tackle this. Thought a little on this, what if a "build" flag is set then the overwritten errors are print to stderr and if not set the error is overwritten without the warning?

@hidmic
Copy link
Contributor Author

hidmic commented Aug 12, 2020

Thought a little on this, what if a "build" flag is set then the overwritten errors are print to stderr and if not set the error is overwritten without the warning?

Hmm, a possible outcome of this discussion is that we start setting error messages in every C function in a call chain. If we do that, then putting a message on stderr saying that there was an error with error handling and that a previous message was overwritten will never be correct.

@wjwwood
Copy link
Member

wjwwood commented Aug 18, 2020

What is supposed to happen is that when you encounter an error with an rcutils function, like failed to duplicate string, from within an rcl function is that you collecte the rcutils error, add it to your new error, clear the error state, and return with a new error state set, something like failed to initialize publisher: failed to duplicate string.

Any proposal to make this easier needs to (in my mind) consider two things:

  • printing to stderr is often not desired (maybe the error is expected and will be ignored/suppressed)
  • memory allocation should be avoided (the current error handling code in rcutils is memory static after the thread local storage is initialized)

@hidmic
Copy link
Contributor Author

hidmic commented Aug 18, 2020

That's one way forward, but I will point out that:

  • Propagating error states with the current API is clunky. Namely, chaining error states properly is hard, and we don't do it almost anywhere. It gets even harder when errors occur while handling errors (see Simplify error handling in failure mode or during finalization  rcutils#269).
  • Error handling buffers usually span a 4k page. If we use them to build backtraces, we have to deal with the possibility of running out of space.

@gbiggs
Copy link
Member

gbiggs commented Aug 19, 2020

Propagating error states with the current API is clunky. Namely, chaining error states properly is hard, and we don't do it almost anywhere. It gets even harder when errors occur while handling errors

Chunkiness can be fixed with a better API. But what interests me about your comment is the errors-while-handling-errors thing. Is this actually different from propagating an error from below?

@hidmic
Copy link
Contributor Author

hidmic commented Aug 19, 2020

Chunkiness can be fixed with a better API.

Sure. I was just stating that if we go that way we probably have to roll out new API.

Is this actually different from propagating an error from below?

If we want to build human-readable backtraces, errors that occur while dealing with errors cannot simply be chained. But perhaps that's a non-issue if we settle on always printing to stderr on ros2/rcutils#269.

@wjwwood
Copy link
Member

wjwwood commented Aug 19, 2020

Error handling buffers usually span a 4k page. If we use them to build backtraces, we have to deal with the possibility of running out of space.

I don't get this point, my suggestion was to not build backtraces but to handle existing errors properly, only combining their message text, as needed, into a single error state at all times. I don't see how we could run out of space.

If you're worried about the stack you can allocate error state objects on the heap, but I doubt that will be an issue.

@hidmic
Copy link
Contributor Author

hidmic commented Aug 19, 2020

I don't get this point, my suggestion was to not build backtraces but to handle existing errors properly, only combining their message text, as needed, into a single error state at all times.

Consider how would the error message look like if we start chaining errors using the "new message : old message" pattern (including file path and line number) in a 5+ levels-deep call stack. It seems to me that, to keep it readable, we have to format it as if it was a backtrace.

I don't see how we could run out of space.

The buffer can hold up to 4096 characters. Let's say that on average the file path is ~70 characters long and the log message is ~50 characters long. That's a roughly ~35 levels deep call stack. It's deep, but not unattainable.

@wjwwood
Copy link
Member

wjwwood commented Aug 19, 2020

I meant running out of space on the stack, not in the error message. Sorry if that wasn't clear.

If the error message gets deep it will look silly, but it will contain a lot of information, which might be useful. If the underlying causes are no longer important, you could choose to summarize the issue at any point in the chain.

@gbiggs
Copy link
Member

gbiggs commented Aug 24, 2020

I can see the value in having a backtrace-style output going down to the real source of an error, but I wonder if that's better handled by using log files to trace back to error causes. A backtrace mechanism sounds like complexity that is not strictly necessary.

It would look cool, though.

If the underlying causes are no longer important, you could choose to summarize the issue at any point in the chain.

I think this is important. We should probably be going case-by-case and understanding what the user needs to see at each point where an error is returned rather than giving them all the gritty details. One point of view is that error messages are for quick diagnostics, with log files for when the cause is not obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants