Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform sending incorrect parameters to AWS API when revoking security group rules #8769

Closed
ghost opened this issue May 24, 2019 · 10 comments · Fixed by #26553
Closed

Terraform sending incorrect parameters to AWS API when revoking security group rules #8769

ghost opened this issue May 24, 2019 · 10 comments · Fixed by #26553
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@ghost
Copy link

ghost commented May 24, 2019

This issue was originally opened by @bstrohmeyer as hashicorp/terraform#21437. It was migrated here as a result of the provider split. The original body of the issue is below.


Terraform Version

Terraform v0.12.0
* provider.aws: version = "~> 2.12"

Terraform Configuration Files

module "sg_common" {
  source      = "./modules/security-group"
  name        = "sg_common"
  description = "Common access security group"
  vpc         = "${data.terraform_remote_state.vpc.outputs.vpc}"

  rules = [
    {
      type        = "ingress"
      from_port   = -1
      to_port     = -1
      protocol    = "icmp"
      description = "Allow ICMP"
      cidr        = ["0.0.0.0/0"]
    },
    {
      type        = "ingress"
      from_port   = 22
      to_port     = 22
      protocol    = "tcp"
      description = "SSH from Bastion"
      source_sg   = module.sg_sc-bastion.id
    },
    {
      type        = "ingress"
      from_port   = 22
      to_port     = 22
      protocol    = "tcp"
      description = "SSH from Gitlab Runners"
      cidr        = ["10.10.10.0/24"]
    },
    {
      type        = "egress"
      from_port   = 0
      to_port     = 0
      protocol    = "-1"
      description = "Outbound Traffic"
      cidr        = ["0.0.0.0/0"]
    },
  ]
}

subnets module:

resource "aws_subnet" "public" {
  vpc_id            = "${var.vpc["vpc_id"]}"
  cidr_block        = "${var.public.subnet_cidr}"
  availability_zone = "${var.subnet_az}"

  tags = {
    Name = "${var.public.subnet_name}"
  }
}

resource "aws_route_table" "public_route_table" {
  vpc_id = "${var.vpc["vpc_id"]}"

  tags = {
    Name = "public-${var.subnet_az}"
  }
}

resource "aws_route_table" "private_route_table" {
  vpc_id = "${var.vpc["vpc_id"]}"

  tags = {
    Name = "private-${var.subnet_az}"
  }
}

resource "aws_route" "igw_route" {
  route_table_id         = "${aws_route_table.public_route_table.id}"
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = "${var.vpc["igw_id"]}"
}

resource "aws_route_table_association" "rt_assn" {
  subnet_id      = "${aws_subnet.public.id}"
  route_table_id = "${aws_route_table.public_route_table.id}"
}

resource "aws_eip" "nat" {
  vpc      = true
}

resource "aws_nat_gateway" "natgw" {
  allocation_id = "${aws_eip.nat.id}"
  subnet_id     = "${aws_subnet.public.id}"

  tags = {
    Name = "natgw-${var.subnet_az}"
  }
}

resource "aws_route" "nat_route" {
  route_table_id         = "${aws_route_table.private_route_table.id}"
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id             = "${aws_nat_gateway.natgw.id}"
}

resource "aws_subnet" "private" {
  count             = length(var.private)
  vpc_id            = var.vpc["vpc_id"]
  cidr_block        = lookup(var.private[count.index], "subnet_cidr")
  availability_zone = var.subnet_az

  tags = {
    Name = lookup(var.private[count.index], "subnet_name")
  }
}

resource "aws_route_table_association" "rt_assn_private" {
  count          = length(var.private)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private_route_table.id
}

Debug Output

Crash Output

Expected Behavior

Added new rule to security group:

    {
      type        = "ingress"
      from_port   = 22
      to_port     = 22
      protocol    = "tcp"
      description = "SSH from Gitlab Runners"
      cidr        = ["10.10.10.0/24"]
    },

Expected Terraform to add this rule to the existing security group.

Actual Behavior

This rule was added in the middle of the list, which then due to the design of the module would require terraform to delete the Egress rule as the list order was upset. This is expected and is reflected in the terraform plan:

 # module.sg_common.aws_security_group_rule.source_cidr[2] must be replaced
-/+ resource "aws_security_group_rule" "source_cidr" {
      ~ cidr_blocks              = [ # forces replacement
          - "0.0.0.0/0",
          + "10.10.10.0/24",
        ]
      ~ description              = "Outbound Traffic" -> "SSH from Gitlab Runners"
      ~ from_port                = 0 -> 22 # forces replacement
      ~ id                       = "sgrule-3520689800" -> (known after apply)
      - ipv6_cidr_blocks         = [] -> null
      - prefix_list_ids          = [] -> null
      ~ protocol                 = "-1" -> "tcp" # forces replacement
        security_group_id        = "sg-09976fde26e3db3ea"
        self                     = false
      + source_security_group_id = (known after apply)
      ~ to_port                  = 0 -> 22 # forces replacement
      ~ type                     = "egress" -> "ingress" # forces replacement
    }

  # module.sg_common.aws_security_group_rule.source_cidr[3] will be created
  + resource "aws_security_group_rule" "source_cidr" {
      + cidr_blocks              = [
          + "0.0.0.0/0",
        ]
      + description              = "Outbound Traffic"
      + from_port                = 0
      + id                       = (known after apply)
      + protocol                 = "-1"
      + security_group_id        = "sg-09976fde26e3db3ea"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 0
      + type                     = "egress"
    }

There was an error on apply, however:

Error: Error revoking security group sg-09976fde26e3db3ea rules: InvalidPermission.NotFound: The specified rule does not exist in this security group.
	status code: 400, request id: b0bba002-bbd7-4118-b47e-7822328109fa


Error: [WARN] A duplicate Security Group rule was found on (sg-09976fde26e3db3ea). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error message: the specified rule "peer: 0.0.0.0/0, ALL, ALLOW" already exists

  on .terraform/modules/sg_common/modules/security-group/main.tf line 12, in resource "aws_security_group_rule" "source_cidr":
  12: resource "aws_security_group_rule" "source_cidr" {

Upon investigation, it appears that terraform is sending the wrong parameters in the RevokeSecurityGroupEgress API call to AWS which results in the first error. From cloudtrail:

"eventSource": "ec2.amazonaws.com",
"eventName": "RevokeSecurityGroupEgress",
"awsRegion": "ap-northeast-1",
"sourceIPAddress": "[removed]",
"userAgent": "aws-sdk-go/1.19.36 (go1.12.5; linux; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.0",
"errorCode": "Client.InvalidPermission.NotFound",
"errorMessage": "The specified rule does not exist in this security group.",
 "requestParameters": {
        "groupId": "sg-09976fde26e3db3ea",
        "ipPermissions": {
            "items": [
                {
                    "ipProtocol": "tcp",
                    "fromPort": 0,
                    "toPort": 0,
                    "groups": {},
                    "ipRanges": {
                        "items": [
                            {
                                "cidrIp": "0.0.0.0/0",
                                "description": "Outbound Traffic"
                            }
                        ]
                    },
                    "ipv6Ranges": {},
                    "prefixListIds": {}
                }
            ]
        }
    },

"ipProtocol": "tcp", is the new value of that security group rule as seen in the plan above, not the existing one. Terraform should be sending "ipProtocol": "-1", in this request to revoke the existing rule. This then results in AWS being unable to match this egress rule to the existing one and it cannot delete it, further resulting in a duplicate security group rule error when it tries to recreate the egress rule.

Steps to Reproduce

Add a security group rule to an existing security group that causes an egress rule to be deleted.

Additional Context

Terraform is running via docker via gitlab runner.

References

@ghost
Copy link

ghost commented Jun 25, 2019

This is still an issue as of provider version 2.16.0

@aeschright aeschright added bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 17, 2019
@michcio1234
Copy link

It's still an issue with Terraform 0.12.20 and aws provider 2.50.0

@dshkyra
Copy link
Contributor

dshkyra commented Mar 7, 2020

The issue seems to be related to terraform 0.12. The same code successfully applies with terraform 0.11 but produces an error with terraform 0.12.

Below is my minimal example:

resource "aws_security_group_rule" "sg_rule" {
  security_group_id = "<security-group-id>"

  type        = "egress"
  from_port   = 0
  to_port     = 0
  protocol    = "udp"

  cidr_blocks = [ "0.0.0.0/0" ]
}

When I change protocol (e.g. from udp to tcp) terraform 0.11 (v0.11.14) successfully applies the change, however terraform 0.12 (v0.12.20) fails with InvalidPermission.NotFound: The specified rule does not exist in this security group.

In both cases I use AWS provider v2.32.

@engie-tawfik
Copy link

engie-tawfik commented Apr 28, 2020

I'm seeing a very strange behaviour with Terraform v0.12.24 + provider.aws v2.56.0.

I ran $ terraform apply, didn't change anything, ran it again to verify that no changes would take place, but I was hit by the security group forcing replacement of my EC2 servers. It is the only security group present in my code, and it has only two ingress rules and one egress rule.

What I couldn't understand is that inside the EC2 creation, Terraform can already recognise the existing security group ID, however it insists on replacing it.

 ~ vpc_security_group_ids       = [
          - "sg-081b9d9f9e40d0b4d",
        ] -> (known after apply)

It's worth mentioning that I have a root module, and three other modules

root
| - Networking
| - Compute
| - Storage

The security group variable is:

  1. output from Networking module,
  2. defined as an empty variable in the variables.tf of the Compute module, and
  3. read by the root module as output of the Networking module and passed to the Compute module when calling it.

Here is security group code:

# SG
resource "aws_security_group" "tf_sg" {
  name = "tf_sg"
  description = "All in one Security Group for SSH, HTTP, etc.."
  vpc_id = "${aws_vpc.tf_vpc.id}"

  # 1. SSH
  ingress {
    from_port = 22
    protocol = "tcp"
    to_port = 22
    cidr_blocks = ["${var.sourceip}"]
  }

  # 2. HTTP
  ingress {
    from_port = 80
    protocol = "tcp"
    to_port = 80
    cidr_blocks = ["${var.everywhere}"]
  }

  # all out permitted, -1 is all protocols including: tcp, upd, icmp, ...
  egress {
    from_port = 0
    protocol = "-1"
    to_port = 0
    cidr_blocks = ["${var.everywhere}"]
  }

I am afraid this is very dangerous and invalidates many use cases involving EC2 and SG.

@PCjrBasic
Copy link

I can confirm this is still an issue as of terraform v0.12.29 and provider.aws v3.0.0. We were using ingress rules instead of egress rules. Terraform was sending a delete command to AWS to delete a rule that didn't exist - a rule based on the new attributes rather than the old attributes like it should have been. I can recreate the problem by taking any existing security group rule and modifying the rule. Once it happens, I can run as many "terraform apply" as I want and the problem won't clear. It appears I can avoid the problem by changing the name of the security group rule when making the change - then terraform views the operation as a delete and a create, not a replace. The problem can also be avoided by first doing a delete, applying it, and then creating and applying.

@PCjrBasic
Copy link

After further testing, it appears this doesn't impact all replacements as I previously though. It appears to only impact rules where the protocol is changed. Changes to the port, cidr, or description don't seem to impact it. When I change the protocol though (tried tcp -> all and tcp -> udp), I seem to be able to reproduce it 100% of the time.

@github-actions
Copy link

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

@github-actions github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Aug 10, 2022
@esilva-everbridge
Copy link

Bumping for visibility, any progress on this?

@github-actions github-actions bot removed the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Aug 10, 2022
@github-actions github-actions bot added this to the v4.29.0 milestone Aug 31, 2022
@github-actions
Copy link

github-actions bot commented Sep 2, 2022

This functionality has been released in v4.29.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Oct 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants