-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combinatorial testing of HTTP parser; bug fixes #3140
combinatorial testing of HTTP parser; bug fixes #3140
Conversation
src/Microsoft.DotNet.Interactive.HttpRequestParser/HttpEmbeddedExpressionNode.cs
Show resolved
Hide resolved
} | ||
else | ||
{ | ||
diagnostics.AddRange(uriBindingResult.Diagnostics); | ||
// FIX: (TryGetHttpRequestMessage) add diagnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to tell from the code whether this is handling the case where a null binding result is returned or where UrlNode
was null. Can we make this else clause explicitly about the case where UrlNode
is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern makes this pretty explicit since TryGetUri
can't return null
. Adding an additional branch for that case would introduce an unreachable code branch.
@@ -109,9 +136,22 @@ public HttpBindingResult<HttpRequestMessage> TryGetHttpRequestMessage(HttpBindin | |||
{ | |||
foreach (var headerNode in headerNodes) | |||
{ | |||
if (headerNode.NameNode is null) | |||
{ | |||
// FIX: (TryGetHttpRequestMessage) add diagnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an example of a case where calling TryGetRequest
will report an error if the header name was missing. However, the loop inside HttpRequestKernel
where we only bind variable expressions will not.
This also seems like a syntax error - so it is odd that we need to do binding to report this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another reason why I feel that we should not try to bind stuff when syntax errors are present under the node (warnings are fine).
If we add an early out for the case where node.GetDiagnostics()
contains errors, we could throw an exception over here (since we know that the parser would already have reported a diagnostic for this case and if it didn't that is a bug / exceptional condition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an example of a case where calling TryGetRequest will report an error if the header name was missing.
This is a syntactic error and a diagnostic should be reported even if binding isn't attempted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should early out if there's a syntactic error in the same node that contains bindings but otherwise, we might as well bind and report every error we know about.
The decision not to proceed with sending a request belongs only at the highest level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern was basically that if we make this a binding-only diagnostic, then we will also need to account for it in other places where we compute and report diagnostics such as over here - but if we can always report a syntax error for this that sounds great.
I think we should early out if there's a syntactic error in the same node that contains bindings but otherwise, we might as well bind and report every error we know about.
It may not be a good experience to see 2 errors about the same problem (e.g., header name was missing) if they are reported from both syntax and binding. Thoughts around how we would prevent that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm not seeing how you would end up with a repeat diagnostic. If there's a missing header value (resulting in a syntactic diagnostic) then there can't be a binding diagnostic because there's nothing to attempt to bind. The binding diagnostics in any case will indicate e.g. an undefined variable, not a missing header value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I thought the TODO
comment above implies that we should report a diagnostic here during binding if HeaderNameNode
is null
. Would this diagnostic not be a duplicate of the diagnostic that the parser would add to the tree when it finds that the name is missing?
My assumption is that all syntactic errors should be reported by the parser as it discovers missing / incorrect constructs during parsing. And I assumed you were saying the same thing in this comment -
This is a syntactic error and a diagnostic should be reported even if binding isn't attempted.
Since TryGetRequest
currently includes all node level syntax errors under the HttpRequestNode
as part of the returned diagnostics (on line 97 above), I think we would end up returning 2 diagnostics for this if my understanding is correct. But I am probably missing something obvious / fundamental...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there would be a duplicate but further testing will make it clear if there's an issue.
@@ -109,9 +136,22 @@ public HttpBindingResult<HttpRequestMessage> TryGetHttpRequestMessage(HttpBindin | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that the HttpBodyNode
contains a TryGetBody
that returns a body object (and HttpUrlNode
contains TryGetUri
) but HttpHeaderNode
does not contain a TryGetHeader
that returns the header object? Should we create HttpHeaderNode.TryGetHeader
and move the code for creating headers in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no common type for headers and no uniform API on HttpRequestMessage
to set them, so I'm not sure adding an abstraction would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see - I assumed there is always a common base type with multiple specializations. Sigh.. I wish the HttpRequestMessage
API had a simpler API to deal with headers.
src/Microsoft.DotNet.Interactive.HttpRequest.Tests/ParserTests.Comments.cs
Show resolved
Hide resolved
|
||
result.SyntaxTree.RootNode | ||
.ChildNodes.Should().ContainSingle<HttpRequestNode>().Which | ||
.MethodNode.ChildTokens.Single().Text.Should().Be(" \t "); | ||
.ChildTokens.Single().Text.Should().Be(" \t "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider pulling out a shared variable for .ChildTokens.Single()
and use that in both assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern, in the presence of AssertionScope
(not used here), will result in a less informative exception if the test fails. Given the popularity of copy-pasting unit tests, I try to avoid patterns like that.
if (ParseComment() is { } commentNode) | ||
{ | ||
node.Add(commentNode); | ||
} | ||
} | ||
else if (CurrentToken is { Kind: HttpTokenKind.Punctuation } and { Text: "/" } && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: This else if
block can be combined with the above one and this would ensure that we only have one call site for ParseComment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered it. I'm not comfortable with the amount of repetition in the lookahead code and combining it made it harder to read, while ParseComment
doesn't actually validate that we're in a comment. This factoring looks wrong so I tend to preserve the exploded code paths in order to see clearer branch coverage until that's corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check we perform here will get even more interesting with support for named requests as we will need to reject comments that contain the # @requestname
syntax :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's another good reason to consolidate the lookaheads into the ParseComment
(or future ParseCommentOrNamedRequest
) method.
@@ -139,78 +147,102 @@ private void ConsumeCurrentTokenInto(HttpSyntaxNode node) | |||
|
|||
private HttpRequestNode ParseRequest() | |||
{ | |||
var requestNode = new HttpRequestNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have an early out here for the case where there are no more tokens (like we have in other Parse methods below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it's redundant with the methods this one calls but assuming there are bugs there, I'd prefer to find them by adding a systematic set of negative tests rather than adding code that has no coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a quick check and if we add a MoreTokens
check and early out after each inner parse call, those branches are covered and all tests still pass. It looks to me like early outs will increase cyclomatic complexity and potentially negatively impact performance for the more cases when the code is well-formed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is still one aspect of the pattern used in the Parse methods that I don't like as it would be easy to forget to add this early out when adding new constructs.
Not for the current PR, but it would be great to improve this and generally iron out the pattern that each Parse method follows so that it is easy to follow the same pattern in any Parse method for new constructs in the future. As we discussed offline, we can explore making it impossible to return null
by having void
returning parse methods now that we allow nodes to be constructed before its children are parsed.
One question with that I think would be whether we will end up with cases where there is some leading / trailing trivia that we fail to round trip because we didn't have a node where it could be added. But I'm sure we can figure out some way to solve that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is still one aspect of the pattern used in the Parse methods that I don't like as it would be easy to forget to add this early out when adding new constructs.
I don't worry about this as long as the tests are solid, and I'm pretty confident the combinatorial tests will catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to terminate the string with incomplete constructs to thoroughly test the early out where there are no more tokens (like the test we added yesterday where the string ended with a the #
token (with missing comment body).
Many of these would also be in error scenarios which the combinatorial tests don't cover currently. But if we also extend the combinatorial testing to cover incorrect / incomplete syntax somehow that would catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use the combinatorial tests to find cases where there should be diagnostics but there aren't, at node-level granularity. I'll start on that soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the changes inside HttpRequestParser.cs
deeply but overall, the changes look good modulo comments.
It would be great to establish a common pattern that all Parse methods can follow so that it is easier to add parsing for new constructs in the future. But I think this can happen down the line as we make more improvements...
No description provided.