-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft: what printf-like error message might look like #1426
draft: what printf-like error message might look like #1426
Conversation
f23c9c7
to
8c0d650
Compare
It seems likely that Windows doesn't support positional format specifiers (which I think we would need for robust translation support). See the failing test and the Microsoft docs. |
Sorry I haven't gotten a chance to look at this this week, but I will first thing next week! |
Okay so I've had some time to look at this for a while now and I'd like to suggest a couple of things and offer some thoughts. First of all, I really love what you've done here. I think being able to put extra information into error messages is going to make this a much more user-friendly experience. It's clear you were thinking about it and I appreciate it! Now that I've seen this work, I think we shouldn't go with printf-style formatting, mostly for the reasons you mentioned. But also, it seems like if we do go with printf-style formatting, it's going to "infect" lots of the call stack into having to accept all of the various parameters necessary for the error message, which seems not great. For these reasons I'd like to go in a different direction that doesn't involve printf-style formatting and also doesn't involve stdargs. To solve these problems, I'd like to suggest a particular approach. That being said, I'm not looking for this to be 100% prescriptive. So if you like this direction, let's do it, otherwise let's talk about (i.e., please don't read this as a demand). I think we shouldn't attempt to build the error messages in C at all. Instead, since we know the number of substitutions at compile-time, so we could template stuff out like we do with the node structs. Then, instead of passing around // The base struct for all of the diagnostic messages.
typedef struct {
yp_diagnostic_id_t type;
} yp_diagnostic_message_t;
// Template out diagnostic messages that have substitutions, otherwise just use base.
typedef struct {
yp_diagnostic_message_t base;
const uint8_t *encoding;
size_t encoding_length;
} yp_error_invalid_encoding_magic_comment_t;
// Update diagnostic structure to have an embedded message.
typedef struct {
yp_list_node_t node;
const uint8_t *start;
const uint8_t *end;
yp_diagnostic_message_t *message;
} yp_diagnostic_t;
// Update the append method to accept a pointer to a message.
bool
yp_diagnostic_list_append(yp_list_t *list, const uint8_t *start, const uint8_t *end, yp_diagnostic_message_t *message) {
yp_diagnostic_t *diagnostic = (yp_diagnostic_t *) malloc(sizeof(yp_diagnostic_t));
if (diagnostic == NULL) return false;
diagnostic->start = start;
diagnostic->end = end;
switch (message->type) {
case YP_ERR_INVALID_ENCODING_MAGIC_COMMENT:
diagnostic->message = malloc(sizeof(yp_error_invalid_encoding_magic_comment_t));
diagnostic->message = *((yp_error_invalid_encoding_magic_comment_t *) message);
break;
default:
diagnostic->message = malloc(sizeof(yp_diagnostic_message_t));
diagnostic->message = *message;
break;
}
yp_list_append(list, (yp_list_node_t *) diagnostic);
return true;
} As an optimization on top of this that we could do in another pass, since we'd be passing around pointers, we could do a trick for the errors that do not contain substitutions and tag the pointer in those cases. So for example we would do: bool
yp_diagnostic_list_append(yp_list_t *list, const uint8_t *start, const uint8_t *end, yp_diagnostic_message_t *message) {
yp_diagnostic_t *diagnostic = (yp_diagnostic_t *) malloc(sizeof(yp_diagnostic_t));
if (diagnostic == NULL) return false;
diagnostic->start = start;
diagnostic->end = end;
if (((uintptr_t) message) & 1) {
diagnostic->message = message;
} else {
switch (message->type) {
case YP_ERR_INVALID_ENCODING_MAGIC_COMMENT:
diagnostic->message = malloc(sizeof(yp_error_invalid_encoding_magic_comment_t));
diagnostic->message = *((yp_error_invalid_encoding_magic_comment_t *) message);
break;
default:
assert(false && "Invalid error message");
abort();
}
}
yp_list_append(list, (yp_list_node_t *) diagnostic);
return true;
} The benefit being that all of the way up the call stack with Then when we're translating into Ruby, we would also template out subclasses of module YARP
class InvalidEncodingCommentError < SyntaxError
attr_reader :encoding
def initialize(encoding)
@encoding = encoding
super("Unknown or invalid encoding `#{encoding}` in the magic comment")
end
end
end When we're serializing, we would serialize the type and then each of the substitutions. When deserializing, a similar process would happen where we could build the errors from the serialized output. The reason I like this approach is that it leaves the string manipulation/potential for translation entirely out of C, so we don't have to mess with varargs or printf. I also like that it means we don't have to allocate a message every time. I recognize that this is quite a bigger change than you had originally intended with this PR, so I'm sorry about that. Let me know what you think. |
@kddnewton I agree the printf-style formatting is not the right choice. I like what you've sketched out here, and I'm going to spike it out to see how it feels. |
Looking for feedback on this "spike" implementation of printf-style formatted error messages, first described in part in #941.
Some positive things:
Some questionable things I'd like feedback on:
%
characters in the error messages%1$.*2$s
)vasprintf
which makes me feel ickyyarp.c:expect()
, we'll need a variation ofyp_diagnostic_list_append
(yp_diagnostic_list_vappend
?)parse_starred_expression
we may need more variations on functions (e.g.,.parse_expression
/parse_expression_va
?)__attribute__((format(printf)))
because the format strings aren't constants, but if someone can teach me that I'm wrong I'd love to understand this better.A linter can probably help us manage the format specifier checks, if we determine we need it.