Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove RawLiteralType synthetic type #6121

Closed

Conversation

Michael0x2a
Copy link
Collaborator

This diff changes how we track raw literal types in the semantic analysis phase. It makes the following changes:

  1. Removes the RawLiteralType synthetic type.

  2. Adds a new TypeOfAny: TypeOfAny.invalid_type as suggested in Better error message for "Invalid type" #4030.

  3. Modifies AnyType so it can optionally contain a new RawLiteral class. This class contains information about the underlying literal that produced that particular TypeOfAny.

  4. Adjusts mypy to stop recommending using Literal[...] when doing A = NewType('A', 4) or T = TypeVar('T', bound=4).

    (The former suggestion is a bad one: you can't create a NewType of a Literal[...] type. The latter suggestion is a valid but stupid one: T = TypeVar('T', bound=Literal[4]) is basically the same thing as T = Literal[4].)

    This resolves Regression: error messages for literal expressions in invalid places are worse #5989.

The net effect of this diff is that:

  1. RawLiteralTypes no longer leak during fine-grained mode, which should partially help unblock Add tests for Literal types with incremental and fine-grained mode #6075.

  2. The way mypy handles literal expressions in types is "inverted".

    Previously, we by default assumed literal expressions would belong inside Literal[...] and tacked on some logic to make them convert into error AnyTypes. Now, we do the reverse: we start with an error AnyType and convert those into Literal[...]s as needed.

    This more closely mirrors the way mypy used to work before we started work on Literal types. It should also hopefully help reduce some of the cognitive burden of working on other parts of the semantic analysis code, since we no longer need to worry about the RawLiteralType synthetic type.

  3. We now have more flexibility in how we choose to handle invalid types: since they're just Anys, we have more opportunities to intercept and customize the exact way in which we handle errors.

    Also see Better error message for "Invalid type" #4030 for additional context. (This diff lays out some of the foundation work for that diff).

This diff changes how we track raw literal types in the semantic
analysis phase. It makes the following changes:

1.  Removes the `RawLiteralType` synthetic type.

2.  Adds a new `TypeOfAny`: `TypeOfAny.invalid_type` as suggested
    in python#4030.

3.  Modifies `AnyType` so it can optionally contain a new
    `RawLiteral` class. This class contains information
    about the underlying literal that produced that particular
    `TypeOfAny`.

4.  Adjusts mypy to stop recommending using `Literal[...]` when
    doing `A = NewType('A', 4)` or `T = TypeVar('T', bound=4)`.

    (The former suggestion is a bad one: you can't create a NewType of a
    Literal[...] type. The latter suggestion is a valid but stupid one:
    `T = TypeVar('T', bound=Literal[4])` is basically the same thing as
    `T = Literal[4]`.)

    This resolves python#5989.

The net effect of this diff is that:

1.  RawLiteralTypes no longer leak during fine-grained mode,
    which should partially help unblock
    python#6075.

2.  The way mypy handles literal expressions in types is "inverted".

    Previously, we by default assumed literal expressions would belong
    inside `Literal[...]` and tacked on some logic to make them convert
    into error `AnyTypes`. Now, we do the reverse: we start with an
    error `AnyType` and convert those into `Literal[...]`s as needed.

    This more closely mirrors the way mypy *used* to work before we
    started work on Literal types. It should also hopefully help reduce
    some of the cognitive burden of working on other parts of the
    semantic analysis code, since we no longer need to worry about the
    `RawLiteralType` synthetic type.

3.  We now have more flexibility in how we choose to handle invalid
    types: since they're just `Anys`, we have more opportunities to
    intercept and customize the exact way in which we handle errors.

    Also see python#4030 for additional
    context. (This diff lays out some of the foundation work for that
    diff).
@Michael0x2a
Copy link
Collaborator Author

One meta-note:

I think my "fix" of just removing the assert from the astmerge visitor in #6075 might have actually been fine: after analyzing the fine-grained logic, it seemed to me that encountering RawLiteralTypes during that phase of the update logic might have actually been expected behavior.

However, I decided that it would just be easier to rip out RawLiteralType altogether instead of sitting down and formalizing this hunch.

I do feel a little bad that this PR tacks on some additional cruft to AnyType, but we're already tracking a lot of provenance-related info in that class, so I felt this wasn't too extreme of a change.

@Michael0x2a Michael0x2a mentioned this pull request Dec 31, 2018
42 tasks
@JukkaL
Copy link
Collaborator

JukkaL commented Jan 2, 2019

This seems too hacky to me. The new responsibility of AnyType seems like an implementation shortcut but doesn't really make sense conceptually, at least to me. Is there something that makes it hard to fix the leaking of raw literal types in fine-grained incremental mode using the existing implementation technique?

Adding TypeOfAny.invalid_type while also keeping RawLiteralType sounds like a more promising approach to me, though I haven't thought about this very carefully.

@ilevkivskyi
Copy link
Member

I am actually with Jukka here.

Also, if we add TypeOfAny.invalid_type we can also easily remove this TODO item in typeanal.py:

        # TODO: Would it be better to always return Any instead of UnboundType
        # in case of an error? On one hand, UnboundType has a name so error messages
        # are more detailed, on the other hand, some of them may be bogus.
        return t

@ilevkivskyi
Copy link
Member

Also, it would make sense to use invalid_type in some other places like this:

        else:  # sym is None
            if self.third_pass:
                self.fail('Invalid type "{}"'.format(t.name), t)
                return AnyType(TypeOfAny.from_error)  # <- here

@Michael0x2a
Copy link
Collaborator Author

Adding TypeOfAny.invalid_type while also keeping RawLiteralType sounds like a more promising approach to me, though I haven't thought about this very carefully.

The main problem with this approach is that I think adding just TypeOfAny.invalid_type doesn't actually buy us much. It lets us determine broadly where that TypeOfAny came from, but I don't think we'd necessarily have enough context to do anything useful in places that end up using that Any.

And if we do add some context to Any, I think we'd end up with a solution similar to what this PR does. That was actually what I was hoping to eventual do: follow-up by generalizing RawLiteral so it context about any kind of invalid type (and I'd perhaps rename the class to InvalidTypeContext or something).

In any case, I'll look into just plugging the leak as well -- but I still think it's worth adding in some variation of this PR if we want to add TypeOfAny.invalid_type.

@Michael0x2a
Copy link
Collaborator Author

Is there something that makes it hard to fix the leaking of raw literal types in fine-grained incremental mode using the existing implementation technique?

Ok, after doing more poking around, I'm pretty confident that we're not actually leaking anything and that removing the assert is the from astmerge's TypeReplaceVisitor.visit_raw_literal_type is the correct thing to do.

In short, we're invoking this class within NodeReplaceVisitor.fixup_type. One of the places this function is called is inside NodeReplaceVisitor.process_base_func, which looks like this:

def process_base_func(self, node: FuncBase) -> None:
    self.fixup_type(node.type)
    node.info = self.fixup(node.info)
    if node.unanalyzed_type:
        # Unanalyzed types can have AST node references
        self.fixup_type(node.unanalyzed_type)

It looks like node.unanalyzed_type has (intentionally) not been semantically analyzed, which is why we're encountering the RawLiteralTypes. This is also probably why TypeReplaceVisitor implements visit_unbound_type (I didn't notice it did until just now).

@ilevkivskyi
Copy link
Member

@Michael0x2a as I mentioned in #6075, I am fine to merge it as is. So you can keep only the parts necessary to fix #5989 in this PR. Optionally, you can also fix the TODO item I mentioned above.

@ilevkivskyi
Copy link
Member

(Also I would keep TypeOfAny.invalid_type because it will be useful for #4030.)

@Michael0x2a
Copy link
Collaborator Author

I think I'll just close this PR and submit a new one with the error message fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: error messages for literal expressions in invalid places are worse
3 participants