Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/resource: require explicit usage of NonRetryableError and RetryableError #17220

Closed
wants to merge 1 commit into from

Conversation

bflad
Copy link
Contributor

@bflad bflad commented Jan 28, 2018

It is currently possible to introduce subtle bugs or crashes when using resource.RetryableError and resource.NonRetryableError and allowing a nil error input. This PR proposes behavior that requires providers to be explicit with their usage of these errors, otherwise returns a bug reporting message to the operator.

For example (https://github.com/terraform-providers/terraform-provider-aws/blob/4c0387645f982ddcd51b3ffe2cc8992c06fb9c2c/aws/resource_aws_elastic_beanstalk_application.go#L105):

	var app *elasticbeanstalk.ApplicationDescription
	err := resource.Retry(30*time.Second, func() *resource.RetryError {
		var err error
		app, err = getBeanstalkApplication(d.Id(), conn)
		if err != nil {
			return resource.NonRetryableError(err)
		}

		if app == nil {
			if d.IsNewResource() {
				return resource.RetryableError(fmt.Errorf("Elastic Beanstalk Application %q not found", d.Id()))
			}
			// err is nil here
			return resource.NonRetryableError(err)
		}
		return nil
	})
	if err != nil {
		return err
	}
	// app can be nil here, so this can crash Terraform
	d.Set("name", app.ApplicationName)

Another example (https://github.com/terraform-providers/terraform-provider-alicloud/blob/927a8e82386e0b32718aa5eef254255fdc36b070/alicloud/resource_alicloud_disk_attachment.go#L112):

	return resource.Retry(5*time.Minute, func() *resource.RetryError {
		err := conn.DetachDisk(instanceID, diskID)
		if err != nil {
			if IsExceptedError(err, DiskIncorrectStatus) || IsExceptedError(err, InstanceLockedForSecurity) ||
				IsExceptedError(err, DiskInvalidOperation) {
				return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err))
			}
		}

		disks, _, descErr := conn.DescribeDisks(&ecs.DescribeDisksArgs{
			RegionId: getRegion(d, meta),
			DiskIds:  []string{diskID},
		})

		if descErr != nil {
			log.Printf("[ERROR] Disk %s is not detached.", diskID)
			// err can be nil here
			return resource.NonRetryableError(err)
		}

		for _, disk := range disks {
			if disk.Status != ecs.DiskStatusAvailable {
				return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err))
			}
		}
		return nil
	})

@bflad
Copy link
Contributor Author

bflad commented Mar 12, 2019

Found another reference to this in the Terraform AWS Provider; this will never retry:

https://github.com/terraform-providers/terraform-provider-aws/blob/75c32b375140813b7994f8031e74b8588a08035a/aws/resource_aws_kms_alias.go#L176-L190

func retryFindKmsAliasByName(conn *kms.KMS, name string) (*kms.AliasListEntry, error) {
	var resp *kms.AliasListEntry
	err := resource.Retry(1*time.Minute, func() *resource.RetryError {
		var err error
		resp, err = findKmsAliasByName(conn, name, nil)
		if err != nil {
			return resource.NonRetryableError(err)
		}
		if resp == nil {
			// err is nil here, so returns nil
			return resource.RetryableError(err)
		}
		return nil
	})
	return resp, err
}

bflad added a commit to hashicorp/terraform-provider-aws that referenced this pull request Mar 13, 2019
… after creation due to eventual consistency

References:
* #7891
* #6560
* #7873
* hashicorp/terraform#17220

The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)`  is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation.

Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries.

Output from acceptance testing:

```
--- PASS: TestAccAWSKmsAlias_no_name (37.63s)
--- PASS: TestAccAWSKmsAlias_name_prefix (37.80s)
--- PASS: TestAccAWSKmsAlias_multiple (38.38s)
--- PASS: TestAccAWSKmsAlias_importBasic (40.13s)
--- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s)
--- PASS: TestAccAWSKmsAlias_basic (46.76s)
```
nywilken pushed a commit to hashicorp/terraform-provider-aws that referenced this pull request Mar 14, 2019
… after creation due to eventual consistency (#7907)

References:
* #7891
* #6560
* #7873
* hashicorp/terraform#17220

The KMS service has eventual consistency considerations and the `aws_kms_alias` resource immediately tries to read the KMS alias after creation, which may not find the KMS alias. When not able to find the KMS alias, the resource logic returns an empty API object instead of an error. Since a `nil` check was already performed on the error, the error will always be `nil`. Invoking `return resource.RetryableError(nil)`  is equivalent to `return nil`. The resource during its Read performs an error check first which will skip because its `nil`, then assumes the resource has been deleted outside Terraform and triggers recreation.

Here when we cannot find a KMS alias after allowing some time for eventual consistency, we return a resource not found error and ensure we handle any timeouts due to automatic AWS Go SDK retries.

Output from acceptance testing:

```
--- PASS: TestAccAWSKmsAlias_no_name (37.63s)
--- PASS: TestAccAWSKmsAlias_name_prefix (37.80s)
--- PASS: TestAccAWSKmsAlias_multiple (38.38s)
--- PASS: TestAccAWSKmsAlias_importBasic (40.13s)
--- PASS: TestAccAWSKmsAlias_ArnDiffSuppress (43.61s)
--- PASS: TestAccAWSKmsAlias_basic (46.76s)
```
@bflad
Copy link
Contributor Author

bflad commented Aug 19, 2019

Another instance of this issue here: hashicorp/terraform-provider-aws#9779 (comment)

https://github.com/terraform-providers/terraform-provider-aws/blob/00909998d919faf5494ab8f6b38241deb1957d99/aws/resource_aws_internet_gateway.go#L55-L65

	err = resource.Retry(5*time.Minute, func() *resource.RetryError {
		igRaw, _, err := IGStateRefreshFunc(conn, d.Id())()
		if igRaw != nil {
			return nil
		}
		if err == nil {
			// err is always nil here, so it will never retry
			return resource.RetryableError(err)
		} else {
			return resource.NonRetryableError(err)
		}
	})

bflad added a commit to hashicorp/terraform-plugin-sdk that referenced this pull request Oct 8, 2019
…yableError

Reference: hashicorp/terraform#17220
Reference: hashicorp/terraform-provider-aws#9779 (comment)
Reference: hashicorp/terraform-provider-aws#9812 (comment)

It is currently possible to introduce subtle bugs or crashes when using `resource.RetryableError` and `resource.NonRetryableError` and allowing a `nil` error input. This PR proposes behavior that requires providers to be explicit with their usage of these errors, otherwise returns a bug reporting message to the operator.

For example (https://github.com/terraform-providers/terraform-provider-aws/blob/4c0387645f982ddcd51b3ffe2cc8992c06fb9c2c/aws/resource_aws_elastic_beanstalk_application.go#L105):

```go
	var app *elasticbeanstalk.ApplicationDescription
	err := resource.Retry(30*time.Second, func() *resource.RetryError {
		var err error
		app, err = getBeanstalkApplication(d.Id(), conn)
		if err != nil {
			return resource.NonRetryableError(err)
		}

		if app == nil {
			if d.IsNewResource() {
				return resource.RetryableError(fmt.Errorf("Elastic Beanstalk Application %q not found", d.Id()))
			}
			// err is nil here
			return resource.NonRetryableError(err)
		}
		return nil
	})
	if err != nil {
		return err
	}
	// app can be nil here, so this can crash Terraform
	d.Set("name", app.ApplicationName)
```

Another example (https://github.com/terraform-providers/terraform-provider-alicloud/blob/927a8e82386e0b32718aa5eef254255fdc36b070/alicloud/resource_alicloud_disk_attachment.go#L112):

```go
	return resource.Retry(5*time.Minute, func() *resource.RetryError {
		err := conn.DetachDisk(instanceID, diskID)
		if err != nil {
			if IsExceptedError(err, DiskIncorrectStatus) || IsExceptedError(err, InstanceLockedForSecurity) ||
				IsExceptedError(err, DiskInvalidOperation) {
				return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err))
			}
		}

		disks, _, descErr := conn.DescribeDisks(&ecs.DescribeDisksArgs{
			RegionId: getRegion(d, meta),
			DiskIds:  []string{diskID},
		})

		if descErr != nil {
			log.Printf("[ERROR] Disk %s is not detached.", diskID)
			// err can be nil here
			return resource.NonRetryableError(err)
		}

		for _, disk := range disks {
			if disk.Status != ecs.DiskStatusAvailable {
				return resource.RetryableError(fmt.Errorf("Detach Disk timeout and got an error: %#v", err))
			}
		}
		return nil
	})
```

Another example (https://github.com/terraform-providers/terraform-provider-aws/blob/75c32b375140813b7994f8031e74b8588a08035a/aws/resource_aws_kms_alias.go#L176-L190):

```go
func retryFindKmsAliasByName(conn *kms.KMS, name string) (*kms.AliasListEntry, error) {
	var resp *kms.AliasListEntry
	err := resource.Retry(1*time.Minute, func() *resource.RetryError {
		var err error
		resp, err = findKmsAliasByName(conn, name, nil)
		if err != nil {
			return resource.NonRetryableError(err)
		}
		if resp == nil {
			// err is nil here, so returns nil
			return resource.RetryableError(err)
		}
		return nil
	})
	return resp, err
}
```

Another example (https://github.com/terraform-providers/terraform-provider-aws/blob/00909998d919faf5494ab8f6b38241deb1957d99/aws/resource_aws_internet_gateway.go#L55-L65):

```go
	err = resource.Retry(5*time.Minute, func() *resource.RetryError {
		igRaw, _, err := IGStateRefreshFunc(conn, d.Id())()
		if igRaw != nil {
			return nil
		}
		if err == nil {
			// err is always nil here, so it will never retry
			return resource.RetryableError(err)
		} else {
			return resource.NonRetryableError(err)
		}
	})
```
@bflad
Copy link
Contributor Author

bflad commented Oct 8, 2019

Closing in preference of hashicorp/terraform-plugin-sdk#199

@bflad bflad closed this Oct 8, 2019
@bflad bflad deleted the f-return-retryerror-when-nil branch October 8, 2019 16:38
@ghost
Copy link

ghost commented Nov 8, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Nov 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant