Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipaddress: Shared Address Space (100.64.0.0/10) is neither private nor global #119812

Open
jstasiak opened this issue May 30, 2024 · 7 comments
Open
Labels
stdlib Python modules in the Lib dir

Comments

@jstasiak
Copy link
Contributor

jstasiak commented May 30, 2024

Something I discovered when working on GH-65056 and GH-113171.

IPv4Address' and IPv4Network's is_global and is_private are false at the same time for addresses in this range:

>>> ipaddress.IPv4Address('100.64.0.0').is_global
False
>>> ipaddress.IPv4Address('100.64.0.0').is_private
False
>>> ipaddress.IPv4Network('100.64.0.0/10').is_global
False
>>> ipaddress.IPv4Network('100.64.0.0/10').is_private
False

I don't believe this is right and I'll explain why.

Historical context

Initial additions

is_private introduced in dc9b255 ("Issue #14814: addition of the ipaddress module (stage 1 - code and tests)") on 2012-05-20.

It only handled the actual private-use ranges for IPv4 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).

Documented as such.

100.64.0.0/10 handling and is_private semantics changes

There have been four ipaddress patches involving this range:

  1. 22c3176 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400; ipaddress should make it easy to identify rfc6598 addresses")
  2. be9c1b1 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400: fix documentation, add cache to is_global and correctly handle 100.64.0.0/10")
  3. e5019d5 ("bpo-26730: Fix SpooledTemporaryFile data corruption #17400: correct handling of 100.64.0.0/10, fixing the docs and updating NEWS")
  4. 742192a ("Issue bpo-41205: Document Decimal power 0 to the 0 #21386: Implement missing IPv4Address.is_global property")

The first three were part of GH-61602 (https://bugs.python.org/issue17400), the fourth one just adds a missing IPv4Address.is_global property to match the IPv4Network.is_global semantics and follows the semantics established in commits 1-3.

Commit 1 changed the semantics of is_private from "Is this a Private-use IPv4 address or a Unique-Local IPv6 address?" to roughly "Do [1] and [2] say globally reachable = false for this address". The documentation change was not quite precise

         A boolean, True if the address is reserved per
        iana-ipv4-special-registry or iana-ipv6-special-registry.

but the intent of the implementation is quite clear.

Commit 1 also added is_global that was effectively not is_private.

In commit 3 an exception for 100.64.0.0/10 is made and both is_global and is_private are false for that range.

is_global/is_private semantics clarification

As part of GH-65056 we changed the documentation to make it clear that we follow the "globally reachable" information from IANA (with some caveats, like special handling of IPv4-mapped IPv6 addresses, see 83f0f8d ("bpo-33433 Fix private address checking for IPv4 mapped IPv6. (GH-26172)")).

The problem

The motivation for handling 100.64.0.0/10 like this can be found here:

#61602 (comment)

The rationale for RFC 6598 is precisely that 100.64.0.0/10 is not private in the common sense, so it would deserve a different treatment in the ipaddress module as well.

I have to admit I don't find it convincing enough to make an exception for it.

[1] says the range is not globally reachable so in my opinion is_private should return true for it as it does for the rest of the not globally reachable address blocks (again, with the exception of IPv4-mapped IPv6 address handling).

I'd find is_private being false for the range surprising if I wasn't clearly aware of the semantics after reading this code multiple times. I believe it may lead to real-world issues.

Additionally the behavior where is_private is not the opposite of is_global is likely to be surprising too.

In short: IMO there should be no exception.

The downside of the proposed solution: this is technically breaking backwards compatibility if some code depends on the current semantics. I'm not sure I'd classify the proposed change strictly as a bug fix.

[1] https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
[2] https://www.iana.org/assignments/iana-ipv6-special-registry/iana-ipv6-special-registry.xhtml

Linked PRs

@encukou
Copy link
Member

encukou commented May 31, 2024

It seems to me that if is_private is always supposed to be exactly the opposite of is_global, then one of them is redundant and should be deprecated.

@jstasiak
Copy link
Contributor Author

Yes, agreed. If the Shared Address Space exception is removed I'll be happy to push that agenda.

ccing @pitrou @ncoghlan – you participated in the GH-61602 discussion, wondering what your thoughts are.

@encukou
Copy link
Member

encukou commented Jun 3, 2024

If the Shared Address Space exception is removed I'll be happy to push that agenda.

If we decide to deprecate something, it should keep its existing warts.
Both a breaking change and a deprecation mean that we're asking users to adapt their code. We want them to only do that once.

@jstasiak
Copy link
Contributor Author

jstasiak commented Jun 3, 2024

Are you saying we should should deprecate and eventually remove is_private and the removal will effectively get rid of the problem of the 100.64.0.0/10 exception?

IMO such deprecation and removal would have to be connected to a claim about the Shared Address Space exception being needless or unwarranted.

Please help me understand what should happen here (a hypothetical sequence or a timeline of events could be useful).

@encukou
Copy link
Member

encukou commented Jun 4, 2024

This needs an expert, or someone looking at the usage and use cases and related context, to determine if this is a problem that needs fixing. Right now I can't commit the necessary time to do this.

If it should be fixed, with a deprecation we can avoid changing existing behaviour and make the (non-deprecated) API smaller/simpler.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 4, 2024

Re-reading the old issue, I'm still of the view that since there are 3 categories of interest (global, private, private-to-carrier), there should be three flags rather than having the third category be soley implied by the first two flags both being false. The fact that one of the categories is much smaller than the other two shouldn't be a major consideration.

Peter didn't want a public is_carrier_private flag for a single network range though, and he was the lead module maintainer at the time. (I also genuinely didn't have a strong preference on the topic, so I have no idea if he might have relented given further consideration of the documentation challenges inherent in not explicitly naming the third category)

I still think it would be wrong to make is_private True for these ranges though (for the reasons given in the original issue). It's genuinely the case that is_private and not is_global don't mean exactly the same thing (is_private excludes private-to-carrier addresses, while not is_global includes them).

If the documentation of is_private has changed such that it suggests that doing so would be correct, then it's the documentation that has diverged from the code's intentional behaviour, not the code.

@ncoghlan
Copy link
Contributor

ncoghlan commented Sep 7, 2024

The is_private/is_global CVE fix going out brought this to my attention again.

Considering options for new properties, I wonder if the following might be the best option:

  1. Document that the is_global and is_private flags are short for is_global_for_all and is_private_for_all, so it's possible for them both to be False when the same address is considered private for some use cases, but global for others.
  2. Add is_global_for_endpoints and is_private_for_endpoints
  3. Add is_global_for_carriers and is_private_for_carriers

There's no way for the standard library to know how the carrier shared address space should be handled in a given application unless it's told whether it's an endpoint use case or a carrier use case.

If we went down this road, the four new property implementations would be:

@property
def is_global_for_endpoints(self):
    return not self.is_private  # Carrier shared addresses are global for endpoints

@property
def is_private_for_endpoints(self):
    return self.is_private  # Addresses are only private for endpoints if they're private for everyone

@property
def is_global_for_carriers(self):
    return self.is_global  # Addresses are only global for carriers if they're global for everyone

@property
def is_private_for_carriers(self):
    return not self.is_global  # Carrier shared addresses are private for carriers

It feels like a fairly heavy solution for a relatively rare case, but I'm thinking that "Complex is better than complicated" applies here: exposing separate property pairs for endpoint use cases and carrier use cases (in addition to the existing "for all" flag attributes) forces people to decide which pair they want for their use case, but once you've made that decision, you can just ignore the other two property pairs.

The main alternative to that "choose a pair of properties and use them consistently" solution is the following:

# Existing `is_global` setting means "Is global for everyone"

@property
def is_global_for_endpoints(self):
    return not self.is_private  # Carrier shared addresses are global for endpoints

@property
def is_private_for_carriers(self):
    return not self.is_global  # Carrier shared addresses are private for carriers

# Existing `is_private` setting means "Is private for everyone"

While superficially simpler (only 2 new properties rather than 4), this feels more complicated to use in practice, since endpoint use cases will want the is_global_for_endpoints and is_private pair, while carrier use cases will want the is_global and is_private_for_carriers pair. Even identifying that those are the valid pairs assumes a reader already understands the problem we're wanting to provide a clearer API for.

@picnixz picnixz added the stdlib Python modules in the Lib dir label Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

4 participants