Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36979 - Change cdn_ssl_version setting to cdn_min_tls_version #10824

Closed
wants to merge 1 commit into from

Conversation

wbclark
Copy link
Contributor

@wbclark wbclark commented Dec 12, 2023

This uses the min_version attribute from Net::HTTP instead of ssl_version, which in turn uses OpenSSL::SSL::SSLContext#min_version and not the older OpenSSL::SSL::SSLContent#ssl_version. As a result the there is no longer a blanket SSLv23 setting and the data format for specifying TLS versions has changed.

It also sets a default value of the new setting for TLSv1.2, which is reasonable because older versions of the standard are deprecated. Users requiring a deprecated protocol version due to their network infrastructure can lower this value as needed.

Considerations taken when implementing this change?

  1. What is a reasonable default value? I went with TLSv1.2 since TLSv1.1 and older standards are officially deprecated
  2. How to best handle the difference in allowed values for min_version and ssl_version in OpenSSL, since ssl_version is deprecated? I opted to introduce a new setting entirely since the functionality has changed and we can't directly map the old allowed values onto the new allowed values.
  3. I removed the related tests from test/models/setting_test.rb as it seems that test/lib/resources/cdn_test.rb provides adequate coverage.

What are the testing steps for this pull request?

  1. Test uploading a manifest, syncing from CDN, etc
  2. Test inter-server sync from a server that has newer TLS disabled... this should fail because older TLS is now blocked by default
  3. Lower the CDN min TLS version value and re-try inter-server sync. Now it should succeed once older TLS has been permitted.

Open Questions

  1. Is there any need for continued support for insecure protocols such as SSLv3 ?
  2. Is a DB migration needed to clean up the removed setting? I tested giving a non-nil value to the older setting before applying the change, it doesn't seem to cause any issue, and it looks like there is precedent for skipping this, as in 283c63d

@wbclark wbclark changed the title Fixes 36979 - Change cdn_ssl_version setting to cdn_min_tls_version Fixes #36979 - Change cdn_ssl_version setting to cdn_min_tls_version Dec 12, 2023
@wbclark wbclark force-pushed the 36979_cdn_ssl_min_version branch 3 times, most recently from bfc17ac to 6583cca Compare December 20, 2023 20:44
@wbclark
Copy link
Contributor Author

wbclark commented Dec 20, 2023

I disabled the TLSv1.3 unit test because CI infra doesn't know about TLSv1.3.

I set TLSv1.2 as the default minimum version (the previous feature had no default)

@wbclark wbclark force-pushed the 36979_cdn_ssl_min_version branch 6 times, most recently from 7f2f8b9 to 0a9dd1e Compare December 21, 2023 22:47
@wbclark
Copy link
Contributor Author

wbclark commented Jan 2, 2024

Requesting reviews from @ehelms for general protocol wizardry and from @hao-yu and @pmoravec for familiarity with the related downstream bug

@pmoravec
Copy link
Member

pmoravec commented Jan 2, 2024

Dunno about impact on upgrades (i.e. does db migration make sense?), and apart of the typo it sounds great!

@wbclark
Copy link
Contributor Author

wbclark commented Jan 2, 2024

Dunno about impact on upgrades (i.e. does db migration make sense?), and apart of the typo it sounds great!

That's a good question.

I tested this by setting a value for the original setting then applying my patch and restarting the server, and found that it didn't cause any immediate issue.

I also found the earlier commit 283c63d which removes a setting without any migration to remove the previous value from the DB.

Maybe @ehelms or @parthaa can say if there is any reason a migration should be included with this one.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 2, 2024

I pushed an update which fixes the typos.

@ehelms
Copy link
Member

ehelms commented Jan 2, 2024

I disabled the TLSv1.3 unit test because CI infra doesn't know about TLSv1.3.

What were the issues? Are there are other reasons not to defer to 1.3 as the default?

fail("Invalid SSL version specified. Check the 'CDN SSL Version' setting")
@cdn_min_tls_version = Setting[:cdn_min_tls_version]
if @cdn_min_tls_version && !SUPPORTED_TLS_VERSIONS.include?(@cdn_min_tls_version)
fail("Invalid TLS min_version specified. Check the 'CDN Minimum Allowed TLS Version' setting")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will users see this failure? Does it propagate to the UI and API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other tricky part being that an organization admin will run into this issue and then have to contact a Satellite admin in order to modify this. I realize that's how the current functionality already works, but my point being we should make sure this is as actionable by a user encountering it as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail() should cause an ISE. I'm not sure if one would need to look at production.log to see the exact message.

That said, I think the only way this condition could be met is if the user has a value already set, and then that value later becomes unavailable (an upgrade to ruby or to the operating system removes the currently configured TLS version).

As long as we test on all supported platforms, I think our testing would catch that and appropriate intervention would be made before an end user would ever face that issue.

I kept and updated the check from the previous implementation (which seems also unlikely to be reached in real world usage) just in case.

If you still have concerns about it, maybe we can create a new issue to follow up? Let me know what you think, thanks.

@ekohl
Copy link
Member

ekohl commented Jan 18, 2024

How many users actually want this feature? It was introduced for compatibility. Since we moved to EL8 the SSLv3 option doesn't do anything (since OpenSSL 1.1.0+ is compiled without it) and I don't think people asked about this. I'd argue simplifying the code base is more important than the edge case where someone wants so specific control over the connections. And if they do, why only this one

@wbclark
Copy link
Contributor Author

wbclark commented Jan 18, 2024

How many users actually want this feature? It was introduced for compatibility. Since we moved to EL8 the SSLv3 option doesn't do anything (since OpenSSL 1.1.0+ is compiled without it) and I don't think people asked about this. I'd argue simplifying the code base is more important than the edge case where someone wants so specific control over the connections. And if they do, why only this one

Well, there was a downstream BZ so someone is definitely trying to use it. I'd defer to @pmoravec about whether this is potentially useful or important since he has deep knowledge of user deployments and different scenarios they are facing, and to @ianballou or @parthaa about whether there's a concrete benefit for the health of the katello code base to remove it.

@ekohl
Copy link
Member

ekohl commented Jan 18, 2024

Well, there was a downstream BZ so someone is definitely trying to use it. I'd defer to @pmoravec about whether this is potentially useful or important since he has deep knowledge of user deployments and different scenarios they are facing, and to @ianballou or @parthaa about whether there's a concrete benefit for the health of the katello code base to remove it.

https://bugzilla.redhat.com/show_bug.cgi?id=2216445 is from @pmoravec and the problem there is that the current default is less secure than if we dropped it (and can cause problems). I'll let him answer if allowing TLS 1.0+ (which will include TLS 1.3) but not giving explicit control is (as I suggested) solving the issue.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 18, 2024

https://bugzilla.redhat.com/show_bug.cgi?id=2216445 is from @pmoravec and the problem there is that the current default is less secure than if we dropped it (and can cause problems). I'll let him answer if allowing TLS 1.0+ (which will include TLS 1.3) but not giving explicit control is (as I suggested) solving the issue.

The new setting would give users the option of reducing the min_version if they absolutely need to for compatibility with other infrastructure (syncing from the CDN via an older proxy, syncing from an older katello server using ISS) but by default it would enforce the more secure TLS1.3 standard.

Given the existence of upgrade paths that involve building a new server on a newer OS and migrating content via ISS, which we've done in the past and could do again in the future, I think it's better to allow admins to quickly and easily modify the allowed min_tls version in the UI, rather than the more error-prone and higher risk process of modifying the system wide crypto policy.

Given that the current version of cdn_ssl_version has default value nil, I think this means that the user must have tried to enable the setting to cause the issue (indicating to me that at least someone is trying to use this?)

Also as you already pointed out, Net::http has ssl_version (which we are setting nil unless the user changes that default) and min_version, corresponding to ssl_version (deprecated) and min_version in OpenSSL::SSL::SSLContext. So what really happens today would depend on the issue you discovered and fixed with ruby/openssl#710 , which would suggest we are allowing the insecure TLS 1.0 protocol for CDN syncs (...unless Net::http sets any defaults of its own instead of falling back to the ruby openssl library default?)

Anyway, to summarize my concerns about removing it:

  1. We don't know how long it will take for the ruby fix to be shipped, and your proposal could potentially delay us from fixing the issue of allowing insecure TLS 1.0 by default.

  2. We could set TLS 1.3 as the default today instead of waiting for the ruby fix to give users TLS 1.2 as a default

  3. It's easier and less error prone to set this via the UI than to modify system wide crypto policy, particularly for something like one-time ISS to migrate content during an upgrade.

  4. There was a use case for a setting like this in the past, and the stated reasons seem likely to persist today and into the future, which is supported by the bug report which says that someone was trying to use a non-default value.

@pmoravec
Copy link
Member

I don't have enough data from sufficiently many customers to claim how (or if) they use the feature in general. But I have seen a few of them where some weird proxy allowed just some specific kind of [authentication|SSLversion|http-only] that made communication impossible if not met.

So I would honour for more configurable / fine-tuned option rather than dropping the config and allow just TLS1.0+.

@ekohl
Copy link
Member

ekohl commented Jan 23, 2024

My main concerns are that this is just such a tiny aspect of all HTTP connections. Even communication to the CDN. Most of that happens via Pulp, so having such a setting is misleading to say the least.

I suspect that most users who implement policies around this want to follow some policy. For example, that TLS < 1.2 shouldn't be used at all. Or that they connect to a proxy that doesn't support some TLS version. But having that only for a single connection doesn't make sense to me.

Tomer was always on a quest to simplify the application by removing settings. That makes the application easier to use and I think this also falls in the same category.

  1. We don't know how long it will take for the ruby fix to be shipped, and your proposal could potentially delay us from fixing the issue of allowing insecure TLS 1.0 by default.

This patch would only partially solve it, not for the whole platform. I don't think a partial fix is worth all the complexity. Both in maintenance and testing, but also for users consuming the product.

2. We could set TLS 1.3 as the default today instead of waiting for the ruby fix to give users TLS 1.2 as a default

I don't think you really win a lot with this. TLSv1.2 is still considered secure and EL7 hosts don't have TLSv1.3 support, so you'd likely break a lot of deployments out of the box. Like where there's an EL7 HTTP proxy.

3. It's easier and less error prone to set this via the UI than to modify system wide crypto policy, particularly for something like one-time ISS to migrate content during an upgrade.

But what do you win with it? System wide crypto policies are probably better for compliance. For example, NIST SP 800-52 Rev. 2 mandates TLS 1.2 as the minimum version with support for TLS 1.3. So anyone who wants compliance with that should already set it via a system wide crypto policy.

4. There was a use case for a setting like this in the past, and the stated reasons seem likely to persist today and into the future, which is supported by the bug report which says that someone was trying to use a non-default value.

It was there to allow for a downgrade in security. It was introduced in 280a77f and https://projects.theforeman.org/issues/8441 tells us that some proxies lacked TLS 1.2 support. I don't think that's relevant anymore.

With my platform hat on I strongly feel we should drop the whole setting and that we should solve it project/product wide. That also means we need to verify Pulp uses the system wide policy too. From a UX and QE perspective it would also be easier.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 23, 2024

But what do you win with it? System wide crypto policies are probably better for compliance. For example, NIST SP 800-52 Rev. 2 mandates TLS 1.2 as the minimum version with support for TLS 1.3. So anyone who wants compliance with that should already set it via a system wide crypto policy.

I know that you are aware of this, but to emphasize for anyone else reading along, if we don't set the min_version then the system wide crypto policy is not used and we get an insecure protocol by default, and the user has no way to change it. For that reason alone, this PR should be merged immediately. Taking this step would not preclude further discussion about redesigning or dropping this feature in the future.

I don't think you really win a lot with this. TLSv1.2 is still considered secure and EL7 hosts don't have TLSv1.3 support, so you'd likely break a lot of deployments out of the box. Like where there's an EL7 HTTP proxy.

This is why I previously had TLSv1.2 as the default value, not TLSv1.3, although I can see the argument for setting TLSv1.3 anyway and accepting that users who can't use TLSv1.3 will need to intervene by lowering it to TLSv1.2. Either of these choices is preferable to the current status quo, and either of these choices is also preferable to dropping this feature immediately without going through design and planning, RFC, communicating the change, etc.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 23, 2024

N.B. I split this portion of my response into a separate comment, to emphasize that I see these details as being less urgent than the primary points I made above.

It was there to allow for a downgrade in security. It was introduced in 280a77f and https://projects.theforeman.org/issues/8441 tells us that some proxies lacked TLS 1.2 support. I don't think that's relevant anymore.

This patch would only partially solve it, not for the whole platform. I don't think a partial fix is worth all the complexity. Both in maintenance and testing, but also for users consuming the product.

So it was introduced in #4823 and some proxies at the time lacked TLS1.2 support. While it is true that this exact circumstance is no longer exactly the case right now, I think it's very likely that a similar circumstance will arise again in the future as the world gradually (and unevenly) migrates to future OSes and TLS protocol versions.

I think a partial fix is better than no fix at all. Previously this was configured via yaml and the installer which would have been a fine way to orchestrate katello and pulp for a consistent configuration, but that was removed and it was moved to the DB and controlled via settings API with #8552 , theforeman/puppet-katello#320 , and theforeman/foreman-installer#477 .

I agree that it's important to follow up with an investigation of how Pulp handles TLS, and that it would be beneficial for users to be able to configure Pulp and Katello to use the same protocol.

@ekohl
Copy link
Member

ekohl commented Jan 23, 2024

I think a partial fix is better than no fix at all

And I disagree with this. Remember that this is from the time when we didn't have unified proxy code. Today if you set up a proxy, you go through Foreman's HTTPProxy. That has code that patches every connection adapter.

Imagine this scenario. You set the minimum TLS version for the CDN to be 1.3, that works. Then later you set a global proxy that doesn't understand TLS 1.3. Suddenly the CDN connection breaks, but all other connections work. How do you expect the user and anyone supporting the user to figure out why that one aspect doesn't work?

My thought is that this current feature helps exactly nobody and introduces a maintenance and usability burden. It's making a tiny aspect of the application special and that's more likely to cause problems than solve any.

I'm not a Katello maintainer, but with my platform hat on I'd say the broken setting needs to be removed without replacement.

@ehelms
Copy link
Member

ehelms commented Jan 24, 2024

I am going to attempt to try and summarize what I think I understand the requirements to be to ensure I properly understand them before delving into the implementation details:

  1. Any TLS connection should default to any system-wide crypto policies.
  2. If no system-wide setting exists, TLS connections should default to the most secure protocol possible.
  3. Users should have a method by which to lower the TLS version in use in the event their infrastructure prevents the use of the defaults.

For #3, I avoided what I think is a key point, and requirement up for debate:

  • Should the user have to downgrade all connections or be able to selectively downgrade connections based on what is breaking?

For #1, I think this is a best practice that we should be designing for across the entirety of our application.
For #2, I think fall backs just make sense, and we should also strive to provide the securest method possible by default.
For #3, environments vary from use case to use case, and we best serve our customers by providing some supported and sane fallback that allows them to continue operations within reason. For example, we should not let the user downgrade the connection to something that is beyond deprecated.

I believe the design hinges on #3 and the point that is up for debate. I have no love for over doing it with settings. I am wary of a solution that would give users no way to adjust the configuration for their environment (even if we think they are making poor security choices).

Another way to put it is, if we opted not to provide a method to lower the TLS version, what would we tell a user who comes looking for help?

@ekohl
Copy link
Member

ekohl commented Jan 24, 2024

Thank you for the summary.

Any TLS connection should default to any system-wide crypto policies.

Let's clarify what system-wide crypto policies specify. It's the minimum TLS version, the maximum and acceptable ciphers. This PR doesn't touch ciphers, so I'll ignore those for now.

I'm going through it: comparing today, the current PR and my proposal where we drop the setting.

Today: The CDN connection does not respect minimum TLS version, but worse, it doesn't respect the maximum TLS version. It limits to TLS 1.0 exactly.

Current PR: The default value (nil) uses the Ruby default for the CDN connection, which doesn't respect the minimum TLS version (setting it to TLS 1.0) but does respect the maximum TLS version. Once my ruby-openssl fix makes it to production it will. Any selected value will divert from the system-wide crypto policy for the minimum version.

My suggestion: This would be equal to the default value of option 2.

If no system-wide setting exists, TLS connections should default to the most secure protocol possible.

I'm not sure if this is an option. RHEL 8 always has a system-wide setting for OpenSSL. The minimum TLS version is 1.2, maximum is 1.3. For context, Debian 11 (and I assume Debian 12) also has TLS 1.2 as the default minimum TLS version.

So I I'd say 1) and 2) are the same case.

Users should have a method by which to lower the TLS version in use in the event their infrastructure prevents the use of the defaults.

This is indeed the crux of the debate.

Another way to put it is, if we opted not to provide a method to lower the TLS version, what would we tell a user who comes looking for help?

If we look at crypto-policies then there's this for RHEL 9 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/security_hardening/using-the-system-wide-cryptographic-policies_security-hardening#excluding-an-application-from-following-the-system-wide-crypto-policies_using-the-system-wide-cryptographic-policies (and I realize quoting multiple sources may be confusing):

You can customize cryptographic settings used by your application preferably by configuring supported cipher suites and protocols directly in the application.

This would be what the current PR is implementing. But I'm questioning if it's adequate. If we implement this, what's the QE plan to verify it really does what it's supposed to?

@ianballou
Copy link
Member

Current PR: The default value (nil) uses the Ruby default for the CDN connection, which doesn't respect the minimum TLS version (setting it to TLS 1.0) but does respect the maximum TLS version.

I thought the default min value was hardcoded for the current PR?

default: Katello::Resources::CDN::SUPPORTED_TLS_VERSIONS.include?('TLSv1.3') ? 'TLSv1.3' : 'TLSv1.2',

Perhaps the code should change the setting default to nil instead of choosing TLS 1.2 or 1.3. That should use the system default, right?

Imagine this scenario. You set the minimum TLS version for the CDN to be 1.3, that works. Then later you set a global proxy that doesn't understand TLS 1.3. Suddenly the CDN connection breaks, but all other connections work. How do you expect the user and anyone supporting the user to figure out why that one aspect doesn't work?

This does indeed sound like a support annoyance, but isn't the setting for those few very special setups where the user should know they're doing something funky with their CDN connection anyway?

This is definitely one of those weird corner-case features, but since it deals with import/export there are probably users doing some interesting and strange things related to disconnected environments.

This would be what the current PR is implementing. But I'm questioning if it's adequate. If we implement this, what's the QE plan to verify it really does what it's supposed to?

I think the test would be to run through the original reproducer steps and verify that they now work:

Steps to Reproduce:
1. Set Administer -> Settings -> Content -> "CDN SSL version" to the highest possible version (TLSv1)
2. Set ISS from another Satellite 6.13 or higher: Content -> Subscriptions -> Manage Manifest -> CDN Configuration -> Network Sync -> provide upstream Sat details (incl. SSL debug cert)
3. Attempt to Update fails with some "tls mismatch" like error.

Besides turning this into a full-blown feature where we let you set the min and max TLS version for all connections via the HTTP Proxy model (or something like that), it sounds like the other options are to accept this PR as-is or remove the setting entirely.

Removing the setting entirely means saying "WONTFIX" to the disconnected users who have interesting TLS requirements, but we do gain some maintainability.
-> We apparently have at least one user who uses this setting, so can we remove it? Perhaps more disconnected folks use it?

I'm thinking removing the setting could cause complaints.

@ekohl
Copy link
Member

ekohl commented Jan 24, 2024

On mobile, so limited quoting capabilities. But on that last point: as I understand it today the setting can enforce TLS 1.0 (no less, no more, thus insecure) or the SSLv3 setting (which I think is SSLv3 or TLSv1, but can't check now). OpenSSL 1.1.0 included in EL8 does NOT have SSLv3 compiled in. That means essentially it's TLSv1 or nothing else. In other words, it's broken

@wbclark
Copy link
Contributor Author

wbclark commented Jan 25, 2024

Thanks for the thoughtful comments, everyone. So that we are all working from the same understanding, I want to clarify some very important points:

  1. The current setting min_ssl_version (what this PR replaces) is broken worse than we realized at first. The two possible values today are TLSv1.0 and nil (default), and both of these are broken.

1.a. If the value TLSv1.0 is used, then we are forcing an insecure, out of date protocol to be used. (This is the originally discovered issue, reported on Bugzilla by @pmoravec )

This happens because we are currently using (deprecated) Net::http#ssl_version= instead of Net::http#min_version=, and the deprecated method both enforces a single version instead of allowing a range of versions, and also can't set protocols newer than TLSv1.0. This first issue affects only those customers that attempt to use TLSv1.0 as the value of min_ssl_version instead of the default nil.

1.b. The second breakage (what @ekohl discovered in Ruby::OpenSSL) is that even when the setting is not used (the current default behavior, which is the nil value), we are still allowing (but not forcing) insecure TLSv1.0 to be used. Almost all users are affected by this flaw, regardless of any system-wide crypto policy they implement; the only users not affected by this version of the flaw, are the ones who are affected by the even worse version (1.a).

  1. There are two proposals for how to move forward:

2.a. (Ewoud's proposal) drop cdn_ssl_version, removing the need to fix 1.a by deprecating the feature. Do not take additional action to fix 1.b, just wait for the upstream Ruby fix (recently merged) to make it into our Ruby. For the time being, customers would not be able to lower the minimum TLS version, because they are already allowing insecure TLS versions and there is no lower to go. In the future, when Ruby starts respecting the system-wide crypto policy, then customers who require it could lower the minimum allowed TLS version only by modifying the system-wide crypto policy.

2.b. (This PR) drop cdn_ssl_version and replace it with cdn_min_tls_version fixing 1.a with a proper working implementation. Set a default value for cdn_min_tls_version so that 1.b is fixed immediately as well. Once Ruby starts respecting the system-wide crypto policy, we then still have the choice of dropping cdn_min_tls_version and system-wide crypto policy becomes the single source of truth (i.e. we pivot later to 2.a after getting the Ruby fix to 1.b and allowing more time to consider the cost/benefit of deprecating) OR we could introduce a new value Use system-wide crypto policy and make it the new default, which both respects the system-wide crypto policy by default and preserves the ability for users to selectively downgrade this connection within the application.

My evaluation is that 2.b is the better approach for now, because it fixes both issues right now, while still allowing us to pursue either design in the long term (i.e. once the Ruby bug is fixed). My increasing concern is that we are debating the design merits of 2.a vs. 2.b as long term ideals, when we should instead move forward with the approach that fixes both flaws today and have a separate conversation about potentially deprecating the feature in the future (ideally, not in a PR review).

What follows are in-line responses to various comments, which I've re-ordered for clarity. First, @ekohl

Current PR: The default value (nil) uses the Ruby default for the CDN connection, which doesn't respect the minimum TLS version (setting it to TLS 1.0) but does respect the maximum TLS version. Once my ruby-openssl fix makes it to production it will. Any selected value will divert from the system-wide crypto policy for the minimum version.

This could be the (at least partial) source of our disconnect? @ianballou got it right here:

I thought the default min value was hardcoded for the current PR?

default: Katello::Resources::CDN::SUPPORTED_TLS_VERSIONS.include?('TLSv1.3') ? 'TLSv1.3' : 'TLSv1.2',

That's correct. What I'm proposing is that we have to hardcode this for now to fix 1.b, and after sufficient time we could then either remove cdn_min_tls_version and only allow changes via the system-wide crypto policy OR we could introduce a new value, with a migration making it default, that respects the system-wide crypto policy out of the box but still allows the user to selectively downgrade the connection.

In response to @ehelms and @ianballou , who suggested

Any TLS connection should default to any system-wide crypto policies.

Perhaps the code should change the setting default to nil instead of choosing TLS 1.2 or 1.3. That should use the system default, right?

Neither 2.a nor 2.b would achieve this today, because if we don't explicitly set min_version here then Net::http won't set it either[1]; in that case, ruby's insecure hardcoded default[2][3] will be used.

[1] https://github.com/ruby/net-http/blob/edc99a54b2c2888759068e2627f2f26c5f505352/lib/net/http.rb#L1635-L1643
[2] ruby/openssl#709
[3] https://github.com/ruby/openssl/blob/fcda6cf9d5f69f6dc0c297ca76feff71f9021f00/lib/openssl/ssl.rb#L25

Until we have a Ruby version with Ewoud's fix, we must explicitly set min_version ourselves or else it will always permit the insecure, deprecated TLSv1.0 protocol, regardless of the system-wide crypto policy.

In response to these comments by @ehelms

Users should have a method by which to lower the TLS version in use in the event their infrastructure prevents the use of the defaults.

Should the user have to downgrade all connections or be able to selectively downgrade connections based on what is breaking?

This is a good question to ask about the future, and IMO strong arguments have been presented in favor of both approaches. With that said, we don't need to decide that right now, and it would be better to have that discussion on Discourse. Thinking about the below comments from @ianballou leads me to the same conclusion.

Removing the setting entirely means saying "WONTFIX" to the disconnected users who have interesting TLS requirements, but we do gain some maintainability.

And this is one reason why I think it's better to go through RFC/Discourse first if we were to go that route eventually, rather than deprecating a feature (even a broken one) in a way that could be perceived as a shortcut to closing a bug.

This is definitely one of those weird corner-case features, but since it deals with import/export there are probably users doing some interesting and strange things related to disconnected environments.

Besides turning this into a full-blown feature where we let you set the min and max TLS version for all connections via the HTTP Proxy model (or something like that), it sounds like the other options are to accept this PR as-is or remove the setting entirely.

And comments like these are the other reason I think we should take that aspect of the conversation to Discourse/RFC rather than making a hasty decision to deprecate a feature during a PR review. Deprecating a feature is a complex decision and should be carefully considered

@pmoravec
Copy link
Member

(2c from somebody who is not a typical developer)

In a long term, relying only on system-wide crypto policy might seem a better approach to me, since that policy covers all connections (not only from katello) that are required for getting some content. I have no experience with such policies but I assume it supersets any setting we (ideally) allow by this feature - but system-wide, not "katello speaks to CDN" scenario only. If the policies allow what this feature is supposed to allow, how much sense it has to replicate the same functionality, for fine-tuned user scenario? (not a rhetorical question)

Could there be user scenarios "we need lower SSL/TLS version just for this and that, but not globally / system-wide"? If not, we have an argument for dropping the feature. If so, we shall evaluate such corner cases if they deserve this feature.

That is for a long term. For a short term, until the ruby PR from @ekohl ++ gets into the ruby version that katello uses, we can treat this PR as a "hotfix". So (at least) for a short term, fixing this feature seems to me as a better option than dropping it - we can re-evaluate once the ruby fix gets to katello.

@ekohl
Copy link
Member

ekohl commented Jan 25, 2024

And comments like these are the other reason I think we should take that aspect of the conversation to Discourse/RFC rather than making a hasty decision to deprecate a feature during a PR review. Deprecating a feature is a complex decision and should be carefully considered

Just jumping on this: the current setting is harmful and there is no logical migration path from one to the other. It's not like you want to store the value that the user has today and transform it something else so we can drop it without deprecation IMHO. Having a discussion about a replacement is a good RFC topic though.

@ianballou
Copy link
Member

ianballou commented Jan 25, 2024

the current setting is harmful and there is no logical migration path from one to the other.

I haven't caught up with the other replies yet but this stood out -- I thought the setting in its current state did help those few people who used insecure proxies? Or is it actually totally broken?

Never mind this, Ewoud answered it above

@ianballou
Copy link
Member

Now that I understand the situation better, I get that we have an immediate need to bump up the minimum TLS version so that TLSv1 isn't used even for users who do not set the setting in question.

The current PR sets TLS to be a secure setting by default. It also lets users do insecure things for whatever unique environment they have. By that logic, I'm happy with the general strategy. Please let me know if I'm missing something.

I can do an actual functionality test on the code once there's a consensus.

@wbclark
Copy link
Contributor Author

wbclark commented Jan 25, 2024

the current setting is harmful and there is no logical migration path from one to the other.

The really important point to me is that the harm exists even if the setting is removed. I am worried about something like a MITM attack that exploits the (always allowed) insecure protocol. Maybe that needs to be chained with other exploits to compromise the supply chain. I am not comfortable with that risk.

It's not like you want to store the value that the user has today and transform it something else so we can drop it without deprecation

The current value of cdn_ssl_version is still stored in the DB if it was defined previously. IF we later decide to keep cdn_min_tls_version and introduce a new default value "Use system-wide crypto policy" then we have the option of a migration that considers both the stale cdn_ssl_version value (if any) and cdn_min_tls_version in determining whether to swap to the new default.

We aren't forced to do this, we still have (in the future) the option of dropping cdn_min_tls_version entirely.

For a short term, until the ruby PR from @ekohl ++ gets into the ruby version that katello uses, we can treat this PR as a "hotfix". So (at least) for a short term, fixing this feature seems to me as a better option than dropping it - we can re-evaluate once the ruby fix gets to katello.

👍 from me. I want to emphasize again that I am hearing some good arguments in favor of dropping cdn_min_tls_version in the long term. I think there are potentially some good arguments for keeping it too, but either way, since we can't drop it entirely right now, we don't need to decide that yet.

The current PR sets TLS to be a secure setting by default. It also lets users do insecure things for whatever unique environment they have. By that logic, I'm happy with the general strategy. Please let me know if I'm missing something.

This is how I see it too.

Sincerely, thank you to everyone for the time, effort, and patience involved in figuring this out.

@ekohl
Copy link
Member

ekohl commented Jan 25, 2024

The really important point to me is that the harm exists even if the setting is removed. I am worried about something like a MITM attack that exploits the (always allowed) insecure protocol. Maybe that needs to be chained with other exploits to compromise the supply chain. I am not comfortable with that risk.

The whole of the project is vulnerable to that. Why is the CDN connection special?

To perform a successful MITM attack you would also need to spoof the DNS resolution and the certificate. If you can do that, providing it over TLS 1.2 or 1.3 isn't a big challenge.

Preventing MITM can best be done by certificate pinning and verifying the certificate. AFAIK certificate pinning is what many mobile applications do and it turns out we already do this in Katello. cdn.redhat.com doesn't use a globally signed certificate so we bundle it in Katello (https://github.com/Katello/katello/blob/master/ca/redhat-uep.pem). I think the verification happens, but not entirely sure what this code does:

options.reverse_merge!(:verify_ssl => 9)

I would have expected it to use some constant, but value 9 is the bitwise OR of 1 and 8. Reverse engineering this, we know these are the various values:

  • 0 OpenSSL::SSL::VERIFY_NONE
  • 1 OpenSSL::SSL::VERIFY_PEER
  • 2 OpenSSL::SSL::VERIFY_FAIL_IF_NO_PEER_CERT
  • 4 OpenSSL::SSL::VERIFY_CLIENT_ONCE

Since 8 undefined, I think this is equivalent to OpenSSL::SSL::VERIFY_PEER so the certificate is verified.

We should change this magic 9 value to some constant instead. It was introduced in f9d4b10 so it's seriously been there for a long time.

So 'm not worried about MITM attacks because we apply better protections already. But perhaps I'm wrong and with TLS 1.0 you can MITM even with certificate verification.

The current value of cdn_ssl_version is still stored in the DB if it was defined previously.

If we do this, the applications raises a warning on every start up, which is a bad user experience.

IF we later decide to keep cdn_min_tls_version and introduce a new default value "Use system-wide crypto policy" then we have the option of a migration that considers both the stale cdn_ssl_version value (if any) and cdn_min_tls_version in determining whether to swap to the new default.

We aren't forced to do this, we still have (in the future) the option of dropping cdn_min_tls_version entirely.

How are you going to convert the current value? I assumed you can't really differentiate between the current value and the new, but now that you mention it, perhaps you can.

@ekohl
Copy link
Member

ekohl commented Feb 12, 2024

I forgot about this until @evgeni reminded me of it: ruby/net-http#55 means that HTTPS proxies are currently not supported. That means the only way I can think of why this was needed is Deep Packet Inspection. In the past many of these middleware boxes broke old TLS support. If you can't support TLS 1.2 these days, large parts of the internet are no longer reachable. So I wouldn't be really concerned with supporting them. I'd also say it's questionable if they were ever supported.

@wbclark
Copy link
Contributor Author

wbclark commented Feb 14, 2024

I forgot about this until @evgeni reminded me of it: ruby/net-http#55 means that HTTPS proxies are currently not supported. That means the only way I can think of why this was needed is Deep Packet Inspection. In the past many of these middleware boxes broke old TLS support. If you can't support TLS 1.2 these days, large parts of the internet are no longer reachable. So I wouldn't be really concerned with supporting them. I'd also say it's questionable if they were ever supported.

That's not my reading of https://bugs.ruby-lang.org/issues/16482 ... "Note that this issue is not about the connection that is routed through the proxy but about the connection to the proxy itself."

@ekohl
Copy link
Member

ekohl commented Feb 16, 2024

That's exactly my point. From the Foreman perspective, if you use a proxy today you MUST have:

Foreman -> HTTP -> Proxy -> HTTP(S) -> target

You can't have:

Foreman -> HTTPS -> Proxy -> HTTP(S) -> target

Foreman can't enforce a minimal TLS version on the Proxy -> target connection.

So cdn_min_tls_version would not apply to Pulp and not apply if you use a HTTP proxy. All in all I'm still convinced this setting is inadequate while adding a burden to both users and maintainers.

@pmoravec
Copy link
Member

I tend to agree it makes less sense to have a dedicated SSL/TLS-version setting just for one particular service or call flow, esp. if it has some further limitations mentioned above.

If there is a system-wide config option for the same, that would make much more sense (and be the argument to just remove this setting).

@wbclark
Copy link
Contributor Author

wbclark commented Feb 19, 2024

That's exactly my point. From the Foreman perspective, if you use a proxy today you MUST have:

Foreman -> HTTP -> Proxy -> HTTP(S) -> target

You can't have:

Foreman -> HTTPS -> Proxy -> HTTP(S) -> target

Foreman can't enforce a minimal TLS version on the Proxy -> target connection.

So cdn_min_tls_version would not apply to Pulp and not apply if you use a HTTP proxy. All in all I'm still convinced this setting is inadequate while adding a burden to both users and maintainers.

OK, and what if you don't have an HTTP proxy?

If there is a system-wide config option for the same, that would make much more sense (and be the argument to just remove this setting).

There is, but Ruby doesn't respect it currently, so unless we are explicit in specifying it ourselves, TLSv1.0 (an insecure, out of date protocol) is always used as the min_tls_version.

I think we've all been in agreement that this setting could be removed in the long term when the Ruby bug is fixed and the system wide crypto policy can actually be respected.

@ianballou
Copy link
Member

ianballou commented Feb 20, 2024

HTTPS proxies aside, this setting still enables syncing Foreman from some anciently configured Foreman that requires old TLS right? (ISS)

@ianballou
Copy link
Member

There hasn't been much movement for a while, and understandably so given the conversation.

However, in following the debate above on "keep the cdn_ssl_version setting" vs "remove the setting", I'd like to confirm:

Removing "cdn_ssl_version" will stop users from being able to sync an older (TLS v1) upstream Foreman with a newer downstream Foreman.

If this is true, I'd like more outside opinions about if we need to continue supporting this feature.

If it's false and either that feature is already broken or maybe I'm just misunderstanding the debate above, then I would be more in support of removing the setting entirely and raising the min tls version to the lowest secure version.

@wbclark
Copy link
Contributor Author

wbclark commented Mar 12, 2024

Closing this, as we addressed it with #10926

Thanks all!

@wbclark wbclark closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants