-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strict configuration #1098
Comments
Could you describe the use case you have in mind for this feature? I'm trying to understand whether your goal is closer to using it for runtime type-checking when instantiating I would be in favor of supporting this if we could come up with a simple implementation with minimal impact on existing logic. I can't tell right now from just thinking about it whether I think the approach you have described would meet the bar in terms of simplicity; @samuelcolvin might have different opinions. One other possible approach might be to introduce a generic type called |
My use case is that I'm planning on using Pydantic's JSON Schema integration to autogenerate clients for my frontend applications. In that setup, it should theoretically be impossible for my backend to receive I'm not a huge fan of the Sounds like you're at least open to seeing a PR? Since you indicated others might have differing opinions, I'll wait a few days, and if nothing changes, I'll give it a shot and we can go from there? |
For what it's worth, I think you could implement what you are describing through the use of a shared base class: from typing import Any, Type, Callable
from pydantic import BaseModel
class PossiblyStrictModel(BaseModel):
def __init_subclass__(cls):
if getattr(cls.__config__, "strict", False):
def __init__(__pydantic_self__, **data: Any) -> None:
for k, v in data.items():
field = __pydantic_self__.__fields__[k]
if not isinstance(v, field.type_): # not right for container types
raise TypeError(f"Received argument {v!r} of incorrect type for field {field}")
super().__init__(**data)
cls.__init__ = __init__
class Model(PossiblyStrictModel):
a: int
class Config:
strict = True
Model(a=1)
Model(a="1") # error; stops erroring if you set Config.strict = False I'm not super excited about the prospect of officially supporting this in pydantic because 100% correctness would probably come with a large number of edge cases to test, and an on-going burden of ensuring both approaches work for every possible field type. I'm certainly open to seeing a PR, I just think acceptance would be dependent on reaching a relatively high bar of simplicity in order to ensure it doesn't increase the on-going maintenance burden, and doesn't result in discouraging future feature work because of backwards compatibility challenges. If you want to take a shot, I'll review it. @samuelcolvin tends to have strong opinions on these sorts of issues though so you might want to wait for his response first. |
Hi, sorry for not replying earlier. Thanks for submitting the request and taking the time to explain why you need it. I'm basically pro implementing this if it's possible for the following reasons:
One point where I think you're incorrect @maxrothman is that validating an input to a float field, does just call In terms of implementation, I think the simplest solution would be to simply call We'd need to store the expected type on a field, and might need some special logic for cases like booleans, but I think that's doable. This would make strict mode slightly slower than normal mode, but I think that's bearable, if that becomes a problem I can think of a number of ways of working around it. In terms of a switch to enable this, I think a config flag would be the easiest solution, we could even have a Thoughts? |
I'm going to second that having a strict mode would be very useful. I would like to default to strict while allowing certain type coercions when specified. To illustrate, yesterday I spent some time debugging a unit test that was broken in a non-obvious way. It turned out that Pydantic was coercing a UUID to an int 😐. In practice, we almost always know exactly what the type should be and would like to enforce that. |
For the foreseeable future (at least until another major release) strict mode will not be the default. But usage should be as simple as changing your import to |
@samuelcolvin thank you for being open to this proposal! Your approach towards this ("bad") request is an exemplar of great open source management. I'm not sure I understand how the Having given this a little more thought, I've realized a slight problem in the straight-up "isinstance" approach: it'll result in a worse dev experience when deserializing complex objects (e.g. datetimes). To pass the To fix that, I think we need a split interface: one for instantiating model instances internally, and one for deserializing external data. The deserialization API would be strict-ish, in that the wire-types of certain Python types would be the same type in JSON. For example
This could mostly be implemented by using the existing There's already a The interface for instantiating instances internally could just be With all that out of my brain, here's my proposal: Add a
Some method is built to deal with types that have multiple useful serializations. Looking through |
I think @samuelcolvin's point is that the |
This awkwardness has come up various times recently. I think the problem boils down to different "degrees" of desired parsing, and the fact that there are a huge number of edge cases and desired behaviors to handle if you try to go beyond "parse everything" and "parse nothing". For what it's worth, if you are "instantiating model instances internally", there is the
This was the reason that (For what it's worth, I totally think it would make sense to offer runtime type-checking of I could see an argument being made for having a config setting that lets you toggle whether This would retain the ability to use custom validators to parse missing fields, perform non-idempotent transformations, etc., and would make it easy to turn on the usual validators for testing, but would also let you get closer to raw |
Oh, I didn't know about My proposal again, but edited taking the above into account: Add a strict flag to model config. This flag changes the parsing behavior of The parsing behavior would be altered to perform strict deserialization, such that Some method is built to deal with types that have multiple useful serializations. Looking through |
Yes, it can be properly type checked, but you have to use the pydantic mypy plugin (I believe the latest release still requires the use of mypy 0.740; hopefully we can push a new release soon to be compatible with latest). Docs here. |
I'm pretty opposed to changing the behaviour of I think one of the reasons pydantic is popular is that for many cases it "just works" - it's also (I think) relatively easy to use for inexperienced developers. I'm opposed to changes which increase the cognitive burden of getting started with pydantic, even if it helps some people.
I'm afraid that's not the whole story, there's also all the custom pydantic types, I count 33 there. Each would need consideration, tests and in some cases modification. I think to keep things simple we should have two modes:
Then we add two generic types (or at least types that can be parameterised as
Does that make sense? @dmontagu do you think @koxudaxi do you think There are still some weird cases like If you want a way to parse datetime's from ISO-8601 strings but nothing else, either:
|
@samuelcolvin I agree with the point about having it "just work", not to mention how much pain would be involved in migrating if we fundamentally changed the behavior of I'm fine with the idea of In some ways I think it could actually simplify the mypy plugin, potentially allowing us to remove the "strict" config setting since it would be specified on the actual object. That said, I also think it would be a substantial amount of work to get it working with mypy, but probably not more than it would take to carefully and correctly implement the feature anyway. |
@samuelcolvin I agree with your thoughts. I desire pydanitc is a simple design. BTW, I feel Of course, I know |
I think that by using My goal is to use Pydantic to define domain types within my backend system that can be bridged to systems in other languages (e.g. Javascript) over a network boundary utilizing Pydantic's JSON Schema features. The idea is that I'll be able to define, say, a User that has a UUID, a hair color, and a number of shirts: class HairColor(Enum):
RED = auto()
BROWN = auto()
BLONDE = auto()
BLUE = auto()
class User(BaseModel):
uuid: UUID4
hair_color: HairColor
num_shirts: int = Field(..., le=100) Thanks to Pydantic's JSON Schema feature, I can now use OpenAPI's codegen library to autogenerate a client library for Javascript. That way, it'll never even be possible for my frontend to send an invalid hair color or an illegally-large value for Basically, I can make the shape of my objects over the wire an implementation detail and pretend the network boundary doesn't exist for the purposes of my application logic, which greatly simplifies development while improving correctness. However, imagine that a weird bug appeared in my frontend application that somehow circumvented my autogenerated client and put some other value into Regardless of whether I use >>> data == '{"uuid": "f84ede9d-fb19-4f35-8223-a209a858df57", "hair_color": 1, "num_shirts": 5}'
>>> User(**json.loads(data))
User(uuid=UUID4(f84ede9d-fb19-4f35-8223-a209a858df57), hair_color=HairColor.BROWN, num_shirts=5) So in essence, I'm looking for a (de)serializer that's capable of easily parsing wire formats into type-safe domain types, including non-primitive types (such as the UUID in the example above), and that can parse without information loss. I have no need for a runtime type checker, and if I did, either I could write one, or I could use one of the other projects on PyPI that provides that functionality. Pydantic is almost what I'm looking for, except that it is lenient when parsing certain types (e.g. bool, float, datetime) in such a way that information loss can occur, which could hide bugs (which would be bad). A class-level flag (or a different base class) that made non-strict types act like strict types would fulfill my use case: I'd still be able to parse complex types from raw data easily, and the risk of information loss would be removed. A flag that replaced the non-strict parsers with data = {"uuid": "f84ede9d-fb19-4f35-8223-a209a858df57", "hair_color": 1, "num_shirts": 5}
>>> User(data)
Error: uuid is not of type UUID4, hair_color is not of type HairColor
>>> # I'd have to take extra steps to parse this type:
>>> User(uuid=UUID4(data['uuid'], hair_color=[k for k in HairColor if k.value == data['hair_color']][0], num_shirts=5)
User(uuid=UUID4(f84ede9d-fb19-4f35-8223-a209a858df57), hair_color=HairColor.BROWN, num_shirts=5) In short, I still want Pydantic to be a parsing library, I just want to be able to configure it to be pickier. Obviously this wouldn't achieve my desire to have all parsing comply with |
@maxrothman I understand where you're coming from but what you're asking for is a very specific mixture of strictness and coercion. There are hundreds of potential cases where it would be reasonable to be strict in one case and lenient in another, for example:
These questions aren't specific to pydantic or even python, javascript and ruby grapple with them too. Remember this? I suspect our misunderstand comes from the following: you're using JSON to transfer data. You therefore want to coerce types when there's no JSON equivalent type (e.g. string to UUID or str to datetime) but not between JSON types (e.g. from float to int or str to float). The problem is that pydantic isn't just used for http API's and therefore the input data isn't always JSON. Look at popular projects using pydantic:
Most of them are much closer to runtime type checking than HTTP APIs. Or more correctly: pydantic is generally used to "secure" data types at code boundaries, but those boundaries take numerous forms and we can't make assumptions about the types people will be passing across them. I see a few options here, I'm genuinely not sure which is best:
Edit. Poll of users and what they want: |
"Plus one" me if you want full strictness; effectively |
"Plus one" me if you want partial strictness;
|
Ahh, I understand now. I hadn't made the connection that Pydantic was agnostic as to the wire format. I can see now why my original request doesn't make much sense, stable de/serialization is a problem that can only really be solved for specific formats. In light of that, I definitely don't support a class-level "strictish" config flag now. I don't have an opinion about a truly strict ( Thoughts? |
Ignore my last comment, apparently that's a different strictness that is already supported. It's late. I'm tired. This is a pretty cool library though, thank you. |
Happy new year everyone! Just wanted to get this discussion moving again. Any thoughts on my last comment @samuelcolvin @dmontagu? |
If sounds like you want the partial strictness I suggested above.
Out of interest, what's wrong with the wildcard validator approach partially demonstated here? I know it'll be slightly slower than proper validators, but I very much doubt you'd notice the difference. You might also be interested in #1130 (comment) |
It'd get me part of the way there, but it would depend on validators/parsers for complex types working the way I'd want them to with their JSON representations. I'd end up with a case for every type, which would start to look a whole lot like the I think my most recent comment might've gotten a bit lost in the shuffle, so I'm going to lay out exactly what I am and am not suggesting: What I'm NOT suggesting
What I am suggesting
My questions
Idle thoughts:
Thoughts? |
Yes I want the functionality that I think you're looking for. Hence why I'm spending time on this issue, if I just didn't want it, I'd close the issue. It's just that we disagree about how to go about making it available, and what work might be required. I'm afraid I don't like the idea of I would accept:
Making pydantic parsing/serialization idempotent is a whole different story. It's heavily related to #951 #624 and the whole way we do serialisation, it's another subject altogether, I have some ideas I want to work on quite closely related to #317 but I haven't had time yet. I already spend too much time on pydantic, for example I've spent multiple hours on this issue alone: reading, replying, thinking, writing gists.
I don't think that's true. If I understand you correctly, your input data is JSON so it can only have 7 types, you want to make sure data is the correct type of those 7. This should be possible in a relatively short function. Please have a go at implementing this as a validator and come back when it's working or not working. More verbose discussion on this issue without trying something isn't moving things forward. |
This sounds like a really elegant solution to the problem to me. I'd be pretty excited if this got implemented! |
I'm sorry if I came off as frustrated, there's no body language on the internet 😅 Since my most recent proposal departed fairly significantly from my original proposal, I want to make sure you were still on board. It sounds like you are, which is great!
I can understand that, it definitely would create a more confusing API. My thinking was that if you assume that a user wants stable/idempotent/whatever-you-want-to-call-it parsing, then it's impossible to support that without having a way to designate which format you want to de/serialize into. You could put that designation on the model, but that'd preclude the ability to have multiple serializations of the same object: to JSON, to querystring, to form-encoding, etc. That's why I ended up on named methods, it's the only design I could think of that supports both stable serialization and multiple serialization formats on a single model. In any case, I thought there was a chance that the API change would be both niche enough and inconvenient enough to most users that it'd make sense to split it out into a separate library, hence that suggestion.
Ok, so it sounds like the companion library option is definitely on the table. Great! I would have those same requirements too, so we're on the same page. To be clear, if I did go for the companion library approach, it'd be to expand the Pydantic ecosystem while have a way to experiment without affecting the majority of users. A fork, hostile or otherwise, is not an option for me. It also sounds like you're interested in tackling stable de/serialization at some point, which is also great! I think my next step will be to attempt a proof-of-concept to get a feel for the design space, then check in again on where Pydantic is with stable de/serialization. I'm more than happy to prove a concept out independently and present it for inclusion once it's more well-developed. In any case, it sounds like stable de/serialization a problem you're thinking about, so we'll just see who gets there first 🙂 |
This would be wonderful, especially being able to specify by field. I am fine if a "1" is a coerced into a 1 for a integer field, but there is no way I want a "1" coerced into a datetime field. I don't see myself using Pydantic for anything until I can prevent random numbers from becoming datetimes |
I've posted a possible workaround here #2079 (comment). from types import new_class
from typing import Any, Callable, Generator, Generic, TypeVar, cast
from pydantic.utils import display_as_type
from typingx import isinstancex
T = TypeVar("T")
class Strict(Generic[T]):
__typeform__: T
@classmethod
def __class_getitem__(cls, typeform: T) -> T:
new_cls = new_class(
f"Strict[{display_as_type(typeform)}]",
(cls,),
{},
lambda ns: ns.update({"__typeform__": typeform}),
)
return cast(T, new_cls)
@classmethod
def __get_validators__(cls) -> Generator[Callable[..., Any], None, None]:
yield cls.validate
@classmethod
def validate(cls, value: Any) -> Any:
if not isinstancex(value, cls.__typeform__):
raise TypeError(f"{value!r} is not a valid {display_as_type(cls.__typeform__)}")
return value Then use it wherever you want class M(BaseModel):
x: int
x_strict: Strict[int]
y: List[Dict[str, int]]
y_strict: Strict[List[Dict[str, int]]]
M(x='1', x_strict='1', y=[{'a': 1}, {'b': '2'}], y_strict=[{'a': 1}, {'b': '2'}])
# pydantic.error_wrappers.ValidationError: 1 validation error for M
# x_strict
# '1' is not a valid int (type=type_error)
# y_strict
# [{'a': 1}, {'b': '2'}] is not a valid List[Dict[str, int]] (type=type_error) But as Samuel explained, "strictness" depends a lot on types. For example here, setting Anyway hope it helps! 😄 |
@samuelcolvin what is the stage of this issue? |
Hi, I've prepared a tiny PR #2509 which adds an option to treat any |
Strict types are available in pydantic: https://pydantic-docs.helpmanual.io/usage/types/#strict-types it's just sad there is no class-wide strict mode implemented yet: See: pydantic/pydantic#1098 Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
Rather than provide a "strictness" flag for Pydantic, a more flexible approach would be to allow overriding the "default" set of validators/converters used by Pydantic, which are currently hardcoded. This would reduce the need for one-off additions to Pydantic's configuration for default validation characteristics such as maximum string length, whitespace stripping, etc. |
Sharing this workaround I made in case it might come in handy for anyone else.
|
would love to see exposed via |
this is fixed on main and will be included in V2, you can try it now on the alpha release. |
Feature request: strict mode configuration flag
This issue has been a sensitive one in the past, so I'm trying to tread lightly here. Pydantic's position so far has been that because it's a parsing, not a validation library, fields should coerce their values to the specified type if possible (e.g. a field type-hinted with
float
given"1"
coerces it to1
), and thatStrict*
fields (e.g.StrictFloat
,StrictBool
, …) are available if users prefer to use them. Requests for a way to make default types behave strictly have been closed due to implementation concerns.However, the codebase has evolved significantly since #578, #360, and #284 were closed, and I think that some of the earlier difficulties in building this feature are no longer present today.
In #284, the reason given for not including a strict mode config was that pydantic would no longer be able to just call
float(x)
and pass the errors along, and that there are edge cases around bools and ints. But in 4f4e22e, the validator forfloat
types becamevalidators.float_validator()
rather thanfloat()
,strict_int_validator()
was added in 1b467da and distinguishes between bools and ints, and on master,float_validator()
actually does extra work compared tostrict_float_validator()
to accept non-float values.As far as I can tell, the only thing required to implement this flag now would be to build
validators._VALIDATORS
conditionally for each model based on a config value, using thestrict_*
validators rather than the standard ones when the flag was set.There are several different ways this feature could be implemented: as a global flag, as another property on the model config, etc. I don't have strong opinions on this topic, since in my code I would probably enable this configuration universally.
Thoughts? The lack of this feature is the only thing keeping me from strongly recommending the use of Pydantic in my workplace, and it's clearly a feature that others are interested in having as well, as evidenced by the prior requests. I would be interested in contributing a PR for this change if you're interested in pursuing it.
The text was updated successfully, but these errors were encountered: