Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_security_group: DependencyViolation: resource sg-XXX has a dependent object #1671

Closed
hashibot opened this issue Sep 15, 2017 · 34 comments · Fixed by #26553
Closed

aws_security_group: DependencyViolation: resource sg-XXX has a dependent object #1671

hashibot opened this issue Sep 15, 2017 · 34 comments · Fixed by #26553
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@hashibot
Copy link

This issue was originally opened by @brikis98 as hashicorp/terraform#11047. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

Terraform v0.8.2

Affected Resource(s)

  • aws_security_group

Terraform Configuration Files

This is part of a larger configuration, but I think the relevant parts are as follows.

Under modules/webserver-cluster/main.tf, I define a module with the following code:

resource "aws_autoscaling_group" "example" {
  launch_configuration = "${aws_launch_configuration.example.id}"
  availability_zones   = ["${data.aws_availability_zones.all.names}"]
  load_balancers       = ["${aws_elb.example.name}"]
  health_check_type    = "ELB"

  min_size = 2
  max_size = 10
}

resource "aws_launch_configuration" "example" {
  image_id        = "ami-40d28157"
  instance_type   = "t2.micro"
  security_groups = ["${aws_security_group.instance.id}"]

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group" "instance" {
  name = "my-security-group"

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group_rule" "allow_http_inbound" {
  type              = "ingress"
  security_group_id = "${aws_security_group.instance.id}"

  from_port   = 80
  to_port     = 80
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]
}

data "aws_availability_zones" "all" {}

resource "aws_elb" "example" {
  name               = "my-example-elb"
  availability_zones = ["${data.aws_availability_zones.all.names}"]
  security_groups    = ["${aws_security_group.elb.id}"]

  listener {
    lb_port           = 80
    lb_protocol       = "http"
    instance_port     = 80
    instance_protocol = "http"
  }

  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    interval            = 30
    target              = "HTTP:80/"
  }
}

resource "aws_security_group" "elb" {
  name = "elb"
}

resource "aws_security_group_rule" "allow_http_inbound" {
  type              = "ingress"
  security_group_id = "${aws_security_group.elb.id}"

  from_port   = 80
  to_port     = 80
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "allow_all_outbound" {
  type              = "egress"
  security_group_id = "${aws_security_group.elb.id}"

  from_port   = 0
  to_port     = 0
  protocol    = "-1"
  cidr_blocks = ["0.0.0.0/0"]
}

output "elb_security_group_id" {
  value = "${aws_security_group.elb.id}"
}

In a separate folder, I use this module in the usual way, but also add a custom security group rule:

module "webserver_cluster" {
  source = "modules/webserver-cluster"

  # ... pass various parameters ...
}

resource "aws_security_group_rule" "allow_testing_inbound" {
  type              = "ingress"
  security_group_id = "${module.webserver_cluster.elb_security_group_id}"

  from_port   = 12345
  to_port     = 12345
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]
}

Expected Behavior

I expect to be able to run terraform apply and terraform destroy without errors.

Actual Behavior

terraform apply works fine. Occasionally, terraform destroy fails with the following error:

aws_security_group.elb: DependencyViolation: resource sg-344baa48 has a dependent object

Steps to Reproduce

  1. terraform apply
  2. terraform destroy

Important Factoids

It's an intermittent issue, so I can't be sure, but I don't think this error happened with Terraform 0.7.x.

@sstarcher
Copy link

I have run into this issue with terraform 0.10.6.

+ module.infrastructure.aws_security_group.sg
    id:                                          <computed>
    description:                                 "Allow traffic to sg from client security groups"
    egress.#:                                    <computed>
    ingress.#:                                   "1"
    ingress.522618655.cidr_blocks.#:             "0"
    ingress.522618655.from_port:                 "1234"
    ingress.522618655.ipv6_cidr_blocks.#:        "0"
    ingress.522618655.protocol:                  "tcp"
    ingress.522618655.security_groups.#:         "1"
    ingress.522618655.security_groups.980544208: "sg-175fa66a"
    ingress.522618655.self:                      "false"
    ingress.522618655.to_port:                   "1234"
    name:                                        "sg_ingress_ydqxa4"
    owner_id:                                    <computed>
    vpc_id:                                      "vpc-63741921"

the delete retried multiple times

* aws_security_group.sg: DependencyViolation: resource sg-234bb25e has a dependent object
	status code: 400, request id: bd64a44d-3e84-4ac4-a2c9-4e392f7c88a3

@s-nakka
Copy link

s-nakka commented Sep 25, 2017

Terraform v0.9.9

Same issue

@ghost
Copy link

ghost commented Oct 3, 2017

Terraform v0.10.7

Same issue. Is the only workaround to delete the SG manually and then recreate it via TF?

@llaski
Copy link

llaski commented Oct 5, 2017

Same issue. For me I'm 99% sure it's because there an ec-2 instance not being changed that is still using the security group. So right now looks like I have to make this change manually.

@ghost
Copy link

ghost commented Oct 6, 2017

Yes, that's indeed the case. I don't have access to the Web UI (managed by the client), so I had to resolve it manually. I created an empty security group, replaced the existing one with that empty SG and then reran the Terraform command. That worked fine.

@nmarchini
Copy link

Same issue for us, has anyone tested if v0.10.8 fixes this?

@gaui
Copy link

gaui commented Nov 10, 2017

I'm on v0.10.8 and have experienced this.

@gaui
Copy link

gaui commented Nov 11, 2017

Got it again...

* module.network.module.aws_vpc.aws_security_group.default_private (destroy): 1 error(s) occurred:

* aws_security_group.default_private: DependencyViolation: resource sg-cd40e2b6 has a dependent object
        status code: 400, request id: 6da496ce-b444-4a5c-b85d-c4f2bbadf842
* module.network.module.aws_vpc.aws_subnet.public[0] (destroy): 1 error(s) occurred:

* aws_subnet.public.0: Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 10m0s

@catsby
Copy link
Contributor

catsby commented Nov 16, 2017

Hey all –

It sounds like, because this is intermittent, that the times it's failing is because "aws_security_group_rule" "allow_testing_inbound" is set to be destroyed after the Security Group itself... I believe because the rule is dependent on the output of the module and not the group itself. But I could be wrong.

As a workaround for that, version 1.2.0 of the AWS provider shipped with an new attribute on security_group called revoke_rules_on_delete:

Adding that to the security group in the module will likely work around this.

There was other mention of an instance still using the group, can anyone provide a configuration that triggers this with instances?

Thanks!

@brikis98
Copy link
Contributor

brikis98 commented Dec 3, 2017

@catsby I just tried setting revoke_rules_on_delete to true on my Security Group, but I still get the exact same aws_security_group: DependencyViolation: resource sg-XXX has a dependent object error on destroy.

The code doesn't seem to be doing anything very complicated. The simplified version is as follows.

I have a module called single-server:

resource "aws_instance" "instance" {
  ami = "${var.ami}"
  instance_type = "${var.instance_type}"
  vpc_security_group_ids = ["${aws_security_group.instance.id}"]
  user_data = "${var.user_data}"
  # ... (other params omitted) ...
}

resource "aws_security_group" "instance" {
  name = "${var.name}"
  description = "Security Group for ${var.name}"
  vpc_id = "${var.vpc_id}"

  # This workaround, unfortunately, did not help
  revoke_rules_on_delete = true
}

resource "aws_security_group_rule" "allow_outbound_all" {
  type = "egress"
  from_port = 0
  to_port = 0
  protocol = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = "${aws_security_group.instance.id}"
}

resource "aws_security_group_rule" "allow_inbound_ssh_from_cidr" {
  count = "${signum(var.allow_ssh_from_cidr)}"
  type = "ingress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  cidr_blocks = ["${var.allow_ssh_from_cidr_list}"]
  security_group_id = "${aws_security_group.instance.id}"
}

resource "aws_security_group_rule" "allow_inbound_ssh_from_security_group" {
  count = "${signum(var.allow_ssh_from_security_group)}"
  type = "ingress"
  from_port = 22
  to_port = 22
  protocol = "tcp"
  source_security_group_id = "${var.allow_ssh_from_security_group_id}"
  security_group_id = "${aws_security_group.instance.id}"
}

I'm using this module in some code that creates a server and two EBS volumes for it:

module "example" {
  source = "../../modules/single-server"

  name = "example"
  instance_type = "t2.micro"
  ami = "${var.ami}"

  allow_ssh_from_cidr_list = ["0.0.0.0/0"]

  vpc_id = "${data.aws_vpc.default.id}"
  subnet_id = "${data.aws_subnet.selected.id}"

  # Script that attaches and mounts the two EBS volumes
  user_data = "${data.template_file.user_data.rendered}"
}

resource "aws_ebs_volume" "example_1" {
  availability_zone = "${data.aws_subnet.selected.availability_zone}"
  type = "gp2"
  size = 5
}

resource "aws_ebs_volume" "example_2" {
  availability_zone = "${data.aws_subnet.selected.availability_zone}"
  type = "gp2"
  size = 5
}

We have automated tests that run against this code and do the following:

  1. Run apply.
  2. Wait for the server to boot.
  3. SSH to the server and write some files to the two EBS volumes.
  4. Change the server name parameter and re-run apply to force a redeploy.
  5. SSH to the server again to make sure the EBS volumes were properly re-attached and we can still read the files we wrote earlier.
  6. Run destroy.

All of this works until destroy, where we get the aws_security_group: DependencyViolation: resource sg-XXX has a dependent object error. It's happening fairly consistently lately, even though there is nothing in the code anywhere—including the Terraform code, User Data script, or test code—that in any way touches the security group, so I'm quite stumped what could possibly be triggering this problem.

@elektron9
Copy link

I'm on 0.10.8 and am currently experiencing this issue. One tip for figuring out what the "dependent object" is: type the name of the SG into the ENI instances searchbox (https://serverfault.com/a/866203/223606). It appears that an attached ENI is preventing Terraform from deleting my SG.

@catsby
Copy link
Contributor

catsby commented Dec 14, 2017

@brikis98 are you able to do as elektron9 above mentioned, and determine what the dependency is? The revoke_rules_on_delete parameter will only help here if the dependency is due to a security group rule that has caused a dependency loop with another security group. Perhaps yours is something else?

@brikis98
Copy link
Contributor

I'll have to check next time I'm working on this code, but I'm pretty sure it's not an ENI, as there are no ENIs being created in that code.

@elektron9
Copy link

elektron9 commented Jan 9, 2018

Update: we eventually determined that this issue was being caused by a security group that had an inbound rule for another security group. After we manually removed the inbound rule, Terraform was able to proceed with the destruction of the security group that was causing this issue.

We did toggle the revoke_rules_on_delete setting to true but the Terraform deploy of that change was blocked by this issue.

@texascloud
Copy link

@elektron9 Can you confirm that revoke_rules_on_delete fixes the issue of being unable to delete a security group that had an inbound rule for another security group?

@elektron9
Copy link

@CamelCaseNotation yes, our issue was resolved after setting revoke_rules_on_delete to true.

@BlaineBradbury
Copy link

Spent several hours on various configurations to attempt to work around this. The DependencyViolation... has a dependent object error occurs after the 5 minute timeout in every scenario. Bottom line is that the network interface does not get assigned to the new security group if a new SG resource must be created (e.g. sg name change). The security group is created (if lifecycle create_before_destroy = true) as desired alongside the existing sg which is assigned to the ENI, but the ENI is never reassigned to the new SG.

While Terraform is waiting, I can go into the AWS console and do "change security groups" on either the Network Interface or the ec2 Instance itself and Terraform will immediately continue its process and remove the old SG before completing.

I also tried several iterations using aws_network_interface_sg_attachment without a security_group block on the aws_instance. This deploys fine, but it relies on the AZs default_vpc to initially launch the ec2 instance into which is a security issue for us and leaves the issue of removing it after deployment. Anyway, the idea was to see if a more specific dependency on the ENI would cause terraform to make the change on AWS (ENI to new SG). It did not work.

Is there no explicit way of causing the "Change Security Group" functionality? Seems this should be done under the Terraform hood whenever it recognizes that the SG will change for an instance or ENI.

@radeksimko radeksimko added the service/ec2 Issues and PRs that pertain to the ec2 service. label Jan 28, 2018
@brikis98
Copy link
Contributor

OK, I finally had some time to go back and dig into this, and I think I've figured out what's happening! The code looks roughly like this:

resource "aws_instance" "example" {
  # ... (other params omitted) ...

  vpc_security_group_ids = ["${aws_security_group.example.id}"]

  tags {
    Name = "${var.name}"
  }
}

resource "aws_security_group" "example" {
  name = "${var.name}"
  revoke_rules_on_delete = true
  # ... (other params omitted) ...
}

In our test code, we are updating var.name and running terraform apply. Changing the name of a security group means deleting the old one and replacing it with a new one... But Terraform can't do that because aws_instance.example still depends on it! That's why we are getting the DependencyViolation: resource sg-XXX has a dependent object error.

I think all we really need is a create_before_destroy = true on aws_security_group.example. I'll try that and report back.

@brikis98
Copy link
Contributor

brikis98 commented Feb 21, 2018

OK, it looks like adding create_before_destroy = true, and using name_prefix instead of name fixed this issue. I can't believe it took me this long to figure it out!

resource "aws_security_group" "example" {
  name_prefix = "${var.name}"
  # ... (other params omitted) ...

  lifecycle {
    create_before_destroy = true
  }
}

@ura718
Copy link

ura718 commented Feb 21, 2018

Sweet! I tested create_before_destroy = true, and using name_prefix instead of name fixed it for me! Thank you brikis98

@gaui
Copy link

gaui commented Feb 21, 2018

@brikis98 @ura718 you can also use revoke_rules_on_delete = true

@martinbokmankewill
Copy link

I am experiencing the issue where security groups are not deleted, referencing dependent objects, because they are attached to lingering ENIs.

The ENIs seem to be coming from an aws_launch_template/aws_autoscaling_group combo and since I did not experience this behaviour when I was using aws_launch_configuration, I suspect that aws_launch_template is somehow the cause of this.

I have tried to solve the problem via revoke_rules_on_delete, lifecycle and name_prefix but they all have no effect since the root cause are the lingering ENIs.

@DrHashi
Copy link

DrHashi commented Aug 9, 2018

As of 0.11.7 it was fixed by lifecycle { create_before_destroy = true }

@jammerful
Copy link
Contributor

@martinbokmankewill I've been running into the same issue recently as well. Noticed, the lingering ENIs are almost always previously attached to an ELB. Are you still running into the issue?

@martinbokmankewill
Copy link

I am not running into the issue anymore.

I traced it to not having set delete_on_termination = true in the network_interfaces part of the aws_launch_template resource I was using.

@mvershinin-chwy
Copy link

Anything I try is doesn't work. You can try in this repo. Just make sure you have DEBUG enabled. Is anyone knows the solution for this repo?

@adiii717
Copy link

No of the above work for me, i have to change SG name in terraform

@Heliosmaster
Copy link

Encountered this today as well.

lifecycle {
    create_before_destroy = true
  }

fixed it for me as well.

@medains
Copy link

medains commented Oct 21, 2021

I've also encountered this, and it's easily repeatable.

A cut down example:

data "aws_vpc" "default" {}

resource "aws_security_group" "efs" {
  name_prefix = "some_prefix"
  vpc_id      = data.aws_vpc.default.id
}

resource "aws_efs_file_system" "example" {
  encrypted = true
}

resource "aws_efs_mount_target" "example" {
  subnet_id       = var.some_valid_subnet
  security_groups = [aws_security_group.efs.id]
}

With these resources, a change requiring replacement of the security group will cause the apply to timeout with the dependency violation.

Adding create_before_destroy to the security group works around the problem.

I think in this specific case, the aws_efs_mount_target cannot have no security group - so terraform attempts to destroy the SG without first removing it from the mount target. Create before destroy results in the order of operations changing, so the mount targets get modified with the new security group before the (now deposed) security group is destroyed.

@flymg
Copy link

flymg commented Dec 8, 2021

The Problem is, that

  lifecycle {
    create_before_destroy = true
  }

prevents updates, as security groups with the same name cannot exist.
Especially the case with EFS Mounts is not handable at the moment.

@medains
Copy link

medains commented Dec 8, 2021

The Problem is, that

  lifecycle {
    create_before_destroy = true
  }

prevents updates, as security groups with the same name cannot exist. Especially the case with EFS Mounts is not handable at the moment.

Yes - the security group has to use "name_prefix" instead of "name" to use this workaround

ChrisBAshton added a commit to alphagov/govuk-aws that referenced this issue Jan 28, 2022
After #1530, we're hitting an error applying the plan:

```
Error: Error applying plan:

2 errors occurred:
	* aws_security_group.mysql-replica (destroy): 1 error occurred:
	* aws_security_group.mysql-replica: Error deleting security group: DependencyViolation: resource sg-26a3915d has a dependent object
	status code: 400, request id: 481cb159-77ee-46ef-813a-e82a9b91f754

	* aws_security_group.mysql-primary (destroy): 1 error occurred:
	* aws_security_group.mysql-primary: Error deleting security group: DependencyViolation: resource sg-d1bc8eaa has a dependent object
	status code: 400, request id: c9eee46f-f760-460f-929e-07e600d4c700
```

Trying a fix outlined in hashicorp/terraform-provider-aws#1671 (comment)
ChrisBAshton added a commit to alphagov/govuk-aws that referenced this issue Jan 28, 2022
After #1530, we're hitting an error applying the plan:

```
Error: Error applying plan:

2 errors occurred:
	* aws_security_group.mysql-replica (destroy): 1 error occurred:
	* aws_security_group.mysql-replica: Error deleting security group: DependencyViolation: resource sg-26a3915d has a dependent object
	status code: 400, request id: 481cb159-77ee-46ef-813a-e82a9b91f754

	* aws_security_group.mysql-primary (destroy): 1 error occurred:
	* aws_security_group.mysql-primary: Error deleting security group: DependencyViolation: resource sg-d1bc8eaa has a dependent object
	status code: 400, request id: c9eee46f-f760-460f-929e-07e600d4c700
```

Trying a fix outlined in hashicorp/terraform-provider-aws#1671 (comment)
@sgyyz
Copy link

sgyyz commented Apr 20, 2022

OK, it looks like adding create_before_destroy = true, and using name_prefix instead of name fixed this issue. I can't believe it took me this long to figure it out!

resource "aws_security_group" "example" {
  name_prefix = "${var.name}"
  # ... (other params omitted) ...

  lifecycle {
    create_before_destroy = true
  }
}

Saved my life. Verified that the creation and destruction will automatically replace the old security group with the new one without any downtime.

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

This functionality has been released in v4.29.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Oct 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.