Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combinatorial testing of HTTP parser; bug fixes #3140

Merged

Conversation

jonsequitur
Copy link
Contributor

No description provided.

@jonsequitur jonsequitur enabled auto-merge (rebase) August 17, 2023 22:50
}
else
{
diagnostics.AddRange(uriBindingResult.Diagnostics);
// FIX: (TryGetHttpRequestMessage) add diagnostic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to tell from the code whether this is handling the case where a null binding result is returned or where UrlNode was null. Can we make this else clause explicitly about the case where UrlNode is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pattern makes this pretty explicit since TryGetUri can't return null. Adding an additional branch for that case would introduce an unreachable code branch.

@@ -109,9 +136,22 @@ public HttpBindingResult<HttpRequestMessage> TryGetHttpRequestMessage(HttpBindin
{
foreach (var headerNode in headerNodes)
{
if (headerNode.NameNode is null)
{
// FIX: (TryGetHttpRequestMessage) add diagnostic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an example of a case where calling TryGetRequest will report an error if the header name was missing. However, the loop inside HttpRequestKernel where we only bind variable expressions will not.

This also seems like a syntax error - so it is odd that we need to do binding to report this error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another reason why I feel that we should not try to bind stuff when syntax errors are present under the node (warnings are fine).

If we add an early out for the case where node.GetDiagnostics() contains errors, we could throw an exception over here (since we know that the parser would already have reported a diagnostic for this case and if it didn't that is a bug / exceptional condition).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an example of a case where calling TryGetRequest will report an error if the header name was missing.

This is a syntactic error and a diagnostic should be reported even if binding isn't attempted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should early out if there's a syntactic error in the same node that contains bindings but otherwise, we might as well bind and report every error we know about.

The decision not to proceed with sending a request belongs only at the highest level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern was basically that if we make this a binding-only diagnostic, then we will also need to account for it in other places where we compute and report diagnostics such as over here - but if we can always report a syntax error for this that sounds great.

I think we should early out if there's a syntactic error in the same node that contains bindings but otherwise, we might as well bind and report every error we know about.

It may not be a good experience to see 2 errors about the same problem (e.g., header name was missing) if they are reported from both syntax and binding. Thoughts around how we would prevent that?

Copy link
Contributor Author

@jonsequitur jonsequitur Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm not seeing how you would end up with a repeat diagnostic. If there's a missing header value (resulting in a syntactic diagnostic) then there can't be a binding diagnostic because there's nothing to attempt to bind. The binding diagnostics in any case will indicate e.g. an undefined variable, not a missing header value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I thought the TODO comment above implies that we should report a diagnostic here during binding if HeaderNameNode is null. Would this diagnostic not be a duplicate of the diagnostic that the parser would add to the tree when it finds that the name is missing?

My assumption is that all syntactic errors should be reported by the parser as it discovers missing / incorrect constructs during parsing. And I assumed you were saying the same thing in this comment -

This is a syntactic error and a diagnostic should be reported even if binding isn't attempted.

Since TryGetRequest currently includes all node level syntax errors under the HttpRequestNode as part of the returned diagnostics (on line 97 above), I think we would end up returning 2 diagnostics for this if my understanding is correct. But I am probably missing something obvious / fundamental...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there would be a duplicate but further testing will make it clear if there's an issue.

@@ -109,9 +136,22 @@ public HttpBindingResult<HttpRequestMessage> TryGetHttpRequestMessage(HttpBindin
{
Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that the HttpBodyNode contains a TryGetBody that returns a body object (and HttpUrlNode contains TryGetUri) but HttpHeaderNode does not contain a TryGetHeader that returns the header object? Should we create HttpHeaderNode.TryGetHeader and move the code for creating headers in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no common type for headers and no uniform API on HttpRequestMessage to set them, so I'm not sure adding an abstraction would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - I assumed there is always a common base type with multiple specializations. Sigh.. I wish the HttpRequestMessage API had a simpler API to deal with headers.


result.SyntaxTree.RootNode
.ChildNodes.Should().ContainSingle<HttpRequestNode>().Which
.MethodNode.ChildTokens.Single().Text.Should().Be(" \t ");
.ChildTokens.Single().Text.Should().Be(" \t ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pulling out a shared variable for .ChildTokens.Single() and use that in both assertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern, in the presence of AssertionScope (not used here), will result in a less informative exception if the test fails. Given the popularity of copy-pasting unit tests, I try to avoid patterns like that.

if (ParseComment() is { } commentNode)
{
node.Add(commentNode);
}
}
else if (CurrentToken is { Kind: HttpTokenKind.Punctuation } and { Text: "/" } &&
Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This else if block can be combined with the above one and this would ensure that we only have one call site for ParseComment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it. I'm not comfortable with the amount of repetition in the lookahead code and combining it made it harder to read, while ParseComment doesn't actually validate that we're in a comment. This factoring looks wrong so I tend to preserve the exploded code paths in order to see clearer branch coverage until that's corrected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check we perform here will get even more interesting with support for named requests as we will need to reject comments that contain the # @requestname syntax :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's another good reason to consolidate the lookaheads into the ParseComment (or future ParseCommentOrNamedRequest) method.

@@ -139,78 +147,102 @@ private void ConsumeCurrentTokenInto(HttpSyntaxNode node)

private HttpRequestNode ParseRequest()
{
var requestNode = new HttpRequestNode(
Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an early out here for the case where there are no more tokens (like we have in other Parse methods below)

Copy link
Contributor Author

@jonsequitur jonsequitur Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's redundant with the methods this one calls but assuming there are bugs there, I'd prefer to find them by adding a systematic set of negative tests rather than adding code that has no coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick check and if we add a MoreTokens check and early out after each inner parse call, those branches are covered and all tests still pass. It looks to me like early outs will increase cyclomatic complexity and potentially negatively impact performance for the more cases when the code is well-formed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is still one aspect of the pattern used in the Parse methods that I don't like as it would be easy to forget to add this early out when adding new constructs.

Not for the current PR, but it would be great to improve this and generally iron out the pattern that each Parse method follows so that it is easy to follow the same pattern in any Parse method for new constructs in the future. As we discussed offline, we can explore making it impossible to return null by having void returning parse methods now that we allow nodes to be constructed before its children are parsed.

One question with that I think would be whether we will end up with cases where there is some leading / trailing trivia that we fail to round trip because we didn't have a node where it could be added. But I'm sure we can figure out some way to solve that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is still one aspect of the pattern used in the Parse methods that I don't like as it would be easy to forget to add this early out when adding new constructs.

I don't worry about this as long as the tests are solid, and I'm pretty confident the combinatorial tests will catch this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to terminate the string with incomplete constructs to thoroughly test the early out where there are no more tokens (like the test we added yesterday where the string ended with a the # token (with missing comment body).

Many of these would also be in error scenarios which the combinatorial tests don't cover currently. But if we also extend the combinatorial testing to cover incorrect / incomplete syntax somehow that would catch this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the combinatorial tests to find cases where there should be diagnostics but there aren't, at node-level granularity. I'll start on that soon.

Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the changes inside HttpRequestParser.cs deeply but overall, the changes look good modulo comments.

It would be great to establish a common pattern that all Parse methods can follow so that it is easier to add parsing for new constructs in the future. But I think this can happen down the line as we make more improvements...

@jonsequitur jonsequitur merged commit af1f769 into dotnet:main Aug 18, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants