Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Elasticsearch HTTP error handling #4922

Merged
merged 4 commits into from
Oct 11, 2021

Conversation

barkbay
Copy link
Contributor

@barkbay barkbay commented Oct 4, 2021

This PR fixes #4918.

It also removes response *http.Response from the APIError object, which is currently preventing Error() from being called multiple times:

// Error() implements the error interface.
func (e *APIError) Error() string {
	defer e.response.Body.Close() # body is closed the first
	reason := "unknown"
	// Elasticsearch has a detailed error message in the response body
	var errMsg ErrorResponse
	err := json.NewDecoder(e.response.Body).Decode(&errMsg) // subsequent calls fail
	if err == nil {
		reason = errMsg.Error.Reason
	}
	return fmt.Sprintf("%s: %s", e.response.Status, reason)
}

This is for example the case when an APIError is aggregated in zapcode.encodeError :

func encodeError(key string, err error, enc ObjectEncoder) (retErr error) {

        basic := err.Error() // Error() is called a first time

        switch e := err.(type) {
        case errorGroup:
                return enc.AddArray(key+"Causes", errArray(e.Errors())) // Error() is called a second time

This is the reason why the error message may contain unknown in the error cause:

{
    "log.level": "error",
    "@timestamp": "2021-09-23T14:25:54.121Z",
    "log.logger": "manager.eck-operator.controller.elasticsearch-controller",
    "message": "Reconciler error",
    "service.version": "1.7.1+26876766",
    "service.type": "eck",
    "ecs.version": "1.4.0",
    "name": "my-cluster",
    "namespace": "elastic-system",
    "error": "400 Bad Request: closed",
    "errorCauses": [
        {
            "error": "400 Bad Request: unknown"
        }
    ],
    "error.stack_trace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.2/pkg/internal/controller/controller.go:214"
}

@barkbay barkbay added >enhancement Enhancement of existing functionality v1.9.0 labels Oct 4, 2021
@barkbay
Copy link
Contributor Author

barkbay commented Oct 4, 2021

Jenkins test this please

#4692

@barkbay
Copy link
Contributor Author

barkbay commented Oct 4, 2021

Jenkins test this please

java.lang.IllegalStateException: JENKINS-37121: something already locked /var/lib/jenkins/workspace/cloud-on-k8s-pr

pkg/controller/elasticsearch/client/error.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/client/error.go Outdated Show resolved Hide resolved
StatusCode: response.StatusCode,
}
// We may need to read the body multiple times, read the full body and store it as an array of bytes.
body, err := ioutil.ReadAll(response.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would put the defer response.Body.Close() statement here (right after we decide to use it, don't forget to close it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, the response body is always read by body, err := ioutil.ReadAll(response.Body), there is no condition. This is why the defer response.Body.Close() is right before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by convention people tend to put the defer statement immediately after the line that requires it (effective go example, uber go style example), but I really don't mind much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the situation is a bit different here since the stream is already opened. The purpose is to be sure that it is effectively closed when we exit the function and no one else will read from it later.

}
// Reset the response body to the original unread state. It allows a caller to read again the body if necessary,
// for example in the case of a 408.
response.Body = ioutil.NopCloser(bytes.NewBuffer(body))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we read that body in case of 408? Or just check the status code?
The code flow here gets a bit tricky. I'm wondering if it may be simpler to always return either response, nil or nil, err for any http request we make. So that function that creates the APIError would read the response body, and then we don't care about that response body anymore, and there's no way to retrieve it (it is still stored in the error object if we need it).
Checking whether the api call we made returned a 408 would be a call to a helper function like client.IsTimeout(err).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we read that body in case of 408? Or just check the status code?

In case of a 408, and if the path is /_cluster/health, then the error is later skipped, and baseClient.request(..) attempts to read again the body from the response.

I think I wanted to be conservative with this PR, keeping the responsibility of reading the "common" errors here, while delegating the more specific ones to the callers. This is the current flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I tend to feel like this is more complicated than it should be, but I'm fine with keeping that PR small and focused on the problem it solves.

var errorResponse ErrorResponse
if err := json.Unmarshal(body, &errorResponse); err != nil {
// Only log at the debug level since it is expected to not be able to parse all types of errors.
// Some errors, like 408 on /_cluster/health may return a different body structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know and expect 408 will return a different body, should we just not parse the error body in case of 408?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky because we don't know how to parse the body in case of a "custom" error here.
For now it's the responsibility of baseClient.request(..,responseObj interface{},skipErrFunc func(error) bool) to decide:

  1. If an API error can be skipped (using skipErrFunc) and should not prevent the body to be parsed. In the case of a 408, the 2 conditions are:
    • path is /_cluster/health
    • code is 408
  2. Unmarshall the response using the struct provided by the caller (responseObj interface{})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I tend to feel like this is more complicated than it should be, but I'm fine with keeping that PR small and focused on the problem it solves.

pkg/controller/elasticsearch/client/error.go Outdated Show resolved Hide resolved
pkg/controller/elasticsearch/client/base.go Show resolved Hide resolved
pkg/controller/elasticsearch/client/base.go Show resolved Hide resolved
Copy link
Contributor

@sebgl sebgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@barkbay
Copy link
Contributor Author

barkbay commented Oct 8, 2021

@thbkrkr I removed FakeAPIError from client.go (see here), it was not in a unit test file and also I don't think it was really required.
Could you check I did not break anything ? Thanks !

@barkbay barkbay requested a review from thbkrkr October 8, 2021 04:45
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@barkbay barkbay merged commit 995403b into elastic:master Oct 11, 2021
@barkbay barkbay deleted the refactor/es-http-errors branch October 11, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v1.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch client should enrich errors returned with context information about the request
3 participants