Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP DOS Mitigation HLD in SONiC #1651

Merged
merged 16 commits into from
Aug 21, 2024

Conversation

muhammadalihussnain
Copy link
Contributor

@muhammadalihussnain muhammadalihussnain commented Apr 2, 2024

This Design Provides DHCP DOS Attack Mitigation Support in HLD

Repo PR title State
sonic-swss SwSS Chnages for DHCP DoS Mitigation Feature ![GitHub issue/pull request detail] (https://img.shields.io/github/pulls/detail/state/Azure/sonic-swss/3130 [img.shields.io])
sonic-buildimage YANG Support for DHCP DoS Mitigation ![GitHub issue/pullrequest detail] (https://img.shields.io/github/pulls/detail/state/Azure/sonic-buildimage/18873) [img.shields.io]
sonic-utilities Utilities Changes for DHCP DoS Mitigation Feature ![GitHub issue/pull request detail] (https://img.shields.io/github/pulls/detail/state/Azure/sonic-utilities/3301 [img.shields.io])
sonic-buildimage DHCP DoS Logger for DHCP DoS Mitigation Feature ![GitHub issue/pullrequest detail] (https://img.shields.io/github/pulls/detail/state/Azure/sonic-buildimage/18947) [img.shields.io]
sonic-mgmt Test Plan and Test Case for DHCP DOS Mitigation Feature ![GitHub issue/pullrequest detail] (https://img.shields.io/github/pulls/detail/state/Azure/sonic-mgmt/12869) [img.shields.io]

@muhammadalihussnain muhammadalihussnain changed the title DHCP Mitigation HLD in SONiC DHCP DOS Mitigation HLD in SONiC Apr 2, 2024

Proposed SONiC CLI commands (OPTION 1 - sonic-buildimage)
- config dhcp_relay mitigation-rate add [interface] [number of packets]
- config dhcp_relay mitigation-rate delete [interface] [number of packets]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From background cmd, the rate limit would take effect on UDP dport 67, which would also effect packet to dhcp_server (#1282), it would cause some confusion or misuse if it's under dhcp_relay sub-command.
sudo tc filter add dev [Interface] protocol ip parent ffff: prio 1 u32 match ip protocol 17 0xff match ip dport 67 0xffff police rate [Byte Rate] burst [Byte Rate] conform-exceed drop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, under dhcp_relay subcommand it will cause confusion and misuse, we will consider the rate-limit directly on the interface.

yaqiangz
yaqiangz previously approved these changes Apr 22, 2024
Copy link
Contributor

@yaqiangz yaqiangz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ridahanif96 ridahanif96 self-requested a review April 22, 2024 04:45
@zhangyanzhao
Copy link
Collaborator

@ridahanif96
Copy link
Collaborator

ridahanif96 commented Apr 24, 2024

@dgsudharsan @prsunny @venkatmahalingam @lguohan We have explored rate limit via ACL. Currently ACL doesn't have capability of rate limit. COPP doesn't rate-limit at interface level.
can you pls provide your suggestions, regarding implementation of rate limit at interface level for ASIC.

@philo-micas
Copy link
Contributor

A large number of DHCPv6 messages may also cause the resources of the DHCP Server to be consumed. Whether to consider supporting the rate limit of DHCPv6 messages?

@muhammadalihussnain
Copy link
Contributor Author

Hi @dgsudharsan @prsunny @venkatmahalingam @lguohan We have modified design. We are proposing change in Behavior to work with this design by removing DHCP rate limit of 300 packets per second implemented through CoPP and replacing it with port-level rate limits implemented in the kernel via Linux Traffic Control.
Please review and suggest should we bring it again to Community review. We are open for design modifications/suggestions

@ridahanif96
Copy link
Collaborator

@zhangyanzhao Hi Yanzhao, can you assign reviewer for this design. We have updated HLD as per the comments/suggestions during HLD Community review.


Since traffic control(TC) only supports rates in the form of bytes per second, this value is multiplied by 406 (number of bytes that make up a DHCP discover packet).
Upon running this command, an ingress queuing discipline is created on the specified port via traffic control(TC). Next, a traffic control(TC) filter is added to filter DHCP discover packets on protocol 17 (UDP) and destination port 67 (port used by DHCP) and a dropping action is applied to filtered incoming traffic. Incoming DHCP discover packets that exceed the rate are dropped to stop the attack from overwhelming the DHCP server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incoming dropped packets are logged in the ipfilter log?
Maybe we can add something that if something passes over this threshold we should perform some extra action like raise a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yarden-Z , No, Incoming dropped packets are not logged in the ipfilter log for Linux systems. iptables can do the task but then we have to mark those packets with traffic control utility and then using iptables we can log about those marked packets " drop by tc" . but tc does not have the ability to log.
Combination of tc and iptables can be used for logs but in this approach we can not drop the packets, which is the reqiurement to mitigate DHCP DoS Attack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if this occurrence happens - how is the user notified of this?
If this is only done and not logged/notified - then a whole attack scenario might be missed.
We will block it, but we will not allow the users to have any information and data about their network and about a potential attack.
In addition - if this happens due to legitimate reasons (large cluster bring up) this might fail as well but without any info to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can be notifiy via show command. In the show command, user can see the number of dropped packets on a specific interface.

}


### Testing Requirements/Design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will the test cases to check functionality be defined? Here there are only CLI test definitions

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yarden-Z , yes the test cases mentioned here are for CLI which will be implemented in the sonic-utilities repo. Other than these, we plan on contributing test plans and test case code to the sonic-mgmt repo for functionality testing of this feature.


Proposed SONiC CLI commands (sonic-utilities)
- config interface dhcp-mitigation-rate add [port] [packet-rate]
- config interface dhcp-mitigation-rate delete [port] [packet-rate]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the [packet-rate] here? If we delete it then I assume we do not need to specify the packet rate value

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Yarden-Z ,
Yes, logically we do not need to specify the packet rate to delete it in this case, however, we implemented the command this way because this is the standard convention used in other SONiC commands (where the value to be removed is always specified in the command, eg. vlan del, interface IP remove, etc). We wanted to keep the command in line with the current CLI behavior of SONiC to maintain consistency.

@asraza07
Copy link

A large number of DHCPv6 messages may also cause the resources of the DHCP Server to be consumed. Whether to consider supporting the rate limit of DHCPv6 messages?

Hi @philo-micas , currently our design supports DHCPv4 only. In the future, we plan on adding support for DHCPv6 rate-limiting.

@asraza07
Copy link

Hi @yaqiangz @Yarden-Z , We have updated the HLD according to suggestions. Kindly help review it.

@ridahanif96
Copy link
Collaborator

@Yarden-Z Hi Yarden, We have updated HLD PR. can you please help review updated HLD and linked code PRs. Thanks in advance.

@ridahanif96
Copy link
Collaborator

@yaqiangz Hi Yaqiang, We have updated HLD PR. can you please help review updated HLD and linked code PRs. Thanks in advance.

Since traffic control(TC) only supports rates in the form of bytes per second, this value is multiplied by 406 (number of bytes that make up a DHCP discover packet).
Upon running this command, an ingress queuing discipline is created on the specified port via traffic control(TC). Next, a traffic control(TC) filter is added to filter DHCP discover packets on protocol 17 (UDP) and destination port 67 (port used by DHCP) and a dropping action is applied to filtered incoming traffic. Incoming DHCP discover packets that exceed the rate are dropped to stop the attack from overwhelming the DHCP server.

A custom program (daemon) has been developed to monitor network traffic. This daemon continuously checks all network ports (tc qdisc) for signs of packet drops. If it detects increase in dropped packets on a specific port, it logs a message to a designated file. This log message serves as an alert for the user, notifying them about potential congestion and packet loss issues on the affected port.
Copy link
Contributor

@yaqiangz yaqiangz May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a copp configuration exist, I guess to user, the min value of limit by tc and copp would take effect. If copp limit is lower than tc, what would the monitor daemon perform? 2 scenarios I am curious: 1) copp limit <packets rate < tc limit. 2) copp limit < tc limit < packets rate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yaqiangz , yes you are right, if both TC and COPP are configured, the minimum value will take effect. For now, the monitor daemon only reads packets dropped due to TC. If a lower COPP rate is configured, then packets will be dropped by COPP, not TC. In this case the monitor daemon will not report dropped packets.

Scenario 1: copp limit < packets rate < tc limit

In this scenario, only COPP will drop packets and the TC limit will go unused as packets rate is less than TC limit, so the monitor daemon will not report packets dropped by COPP.

Scenario 2: copp limit < tc limit < packets rate

In this scenario, since the COPP limit is lower than TC and packets go through COPP first (since it is in hardware) only COPP limit will take effect, so the monitor daemon will not report packets dropped by COPP.

To avoid this, our design is aimed to replace the default COPP DHCP rate limit of 300 so that by default COPP and TC do not exist at the same time. We did this because TC allows for more precise rate limiting at an interface level so it holds an advantage over COPP in the event of a DoS attack.

If you have any suggestions on how we can modify the monitor daemon to also record COPP dropped packets, then please share, we will try to add it to our design.

Kindly also review the linked code PRs to this HLD, thanks in advance.

"lanes": "25,26,27,28",
"mtu": "9100",
"speed": "40000",
"dhcp_rate_limit" : "300"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the unit of rate same as copp? If in COPP configure the cir and cbs are 300, are they mean same rate limit with your configuration?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yaqiangz , yes it is the same rate unit. In COPP, the default rate limit is 300 packets/sec which is the same rate limit in our configuration. Since we plan on replacing default COPP with our configuration, we chose to give a default value of 300 packets/sec to ensure backward compatibility.

Copy link
Contributor

@yaqiangz yaqiangz May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you will deprecate DHCP copp configuration? I didn't find it in current PRs, hence I assume that currently we accept that DHCP COPP would exist for now. My below questions are based on the assumption.

I checked change you added in db_migrater https://github.com/sonic-net/sonic-utilities/pull/3301/files. If upgraded from a version that doesn't support/configure tc rate limit, it would be configured as 300. You mentioned that by default COPP and TC do not exist at the same time. But from current PRs they will exist at the same time? In this scenario, copp is set to 300 or 100, and tc rate limit is 300. https://github.com/sonic-net/sonic-buildimage/blob/74317df1b07f778d8b7b21159afc92a7583c1760/files/image_config/copp/copp_cfg.j2#L37. TC configuration and log wouldn't take effect. Would it be better to keep it empty when migrating if previous OS didn't contain this value in the scenario that DHCP copp exist? And add this configuration on demand

Based on above, if we agree to keep empty by default for this configure. I wonder whether the log daemon process run all the time even if there is not any tc limit exist?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yaqiangz , we had planned to depreciate the default DHCP copp configuration as mentioned in our HLD document. We have now made the PR and linked it to this HLD. Our design ensures that by default only TC rate limits will be configured so that both TC and COPP do not exist at the same time. In this scenario, the log daemon process will run properly with no complications.

@zhangyanzhao
Copy link
Collaborator

late coming features with multiple code PRs, defer to next release per release triage

@asraza07
Copy link

asraza07 commented Jun 14, 2024

Hi @Yarden-Z @yaqiangz , kindly help review this HLD. Thanks in advance!

@yaqiangz
Copy link
Contributor

Hi @Yarden-Z @yaqiangz , kindly help review this HLD. Thanks in advance!

Hi @asraza07, I want to confirm that with COPP existing, this proposal would only take effect in the scenario: [tc rate limit < actual packet rate < COPP rate limit]. And in my understanding, current proposal cannot mitigate ddos attack that [actual packet rate > COPP rate limit], correct? I wonder whether you have plan or insights for this scenario?

@asraza07
Copy link

Hi @Yarden-Z @yaqiangz , kindly help review this HLD. Thanks in advance!

Hi @asraza07, I want to confirm that with COPP existing, this proposal would only take effect in the scenario: [tc rate limit < actual packet rate < COPP rate limit]. And in my understanding, current proposal cannot mitigate ddos attack that [actual packet rate > COPP rate limit], correct? I wonder whether you have plan or insights for this scenario?

Hi @yaqiangz , you are correct, DHCP DoS mitigation is only possible in the scenario where actual packet rate is less than the COPP rate limit. We have designed this feature on the premise that COPP is not intended to mitigate DHCP DoS attacks; it is there to regulate the flow of control plane and ensure control plane stays intact. Our feature is there to mitigate DHCP DoS attacks (only possible via the kernel). Without removing COPP, we can say that DHCP DoS mitigation is only possible when packet rate does not exceed COPP limit, because otherwise COPP starts dropping DHCP packets including legitimate clients.

Copy link
Contributor

@Yarden-Z Yarden-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok

Copy link
Contributor

@yaqiangz yaqiangz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangyanzhao zhangyanzhao merged commit 8c39d0c into sonic-net:master Aug 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants