Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactoring transformers and Zod 3 #264

Closed
colinhacks opened this issue Dec 9, 2020 · 35 comments
Closed

RFC: Refactoring transformers and Zod 3 #264

colinhacks opened this issue Dec 9, 2020 · 35 comments

Comments

@colinhacks
Copy link
Owner

colinhacks commented Dec 9, 2020

TLDR

Transformers are poorly designed. I want to reimplement them but there will be breaking changes. Under this proposal, Zod 2 will never come out of beta and we'll jump straight to Zod 3 with a better implementation of transformers. Skip to "New Implementation" for details.

The reason Zod 2 has been in beta for so long (well, one of the reasons anyway!) is that I’ve been increasingly unsatisfied with Zod’s implementation of transformers. Multiple issues have cropped up that have led me to re-evaluate the current approach. At best, transformers are a huge footgun and at worst they’re fundamentally flawed.

Context

Previously (in v1) the ZodType base class tracked the inferred type in a generic parameter called Type. But transformers accept an Input of one type and return an Output of a different type. To account for this, Type was renamed to Output and a third generic property was added (Input). For non-transformers, Input and Output are the same. For transformers only, these types are different.

Screen Shot 2020-12-08 at 5 09 36 PM

Let's look at stringToNumber as an example:

const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));

For stringToNumber, Input is string and Output is number. Makes sense.

What happens when you pass a value into stringToNumber.parse?

  1. The user passes a value into stringToNumber.parse
  2. The transformer passes this value through the parse function of its input schema (z.string()). If there are parsing errors, it throws a ZodError
  3. The transformer takes the output from that and passes it into the transformation function (val => parseFloat(val))
  4. The transformer takes the output of the transformation and validates it against the output schema (z.number())
  5. The result of that call is returned

Here's the takeaway: for a generic transformer z.transformer(A, B, func), where A and B are Zod schemas, the argument of func should be the Output type of A and the return type is the Input type of B. This lets you do things like this:

const stringToNumber = z.transformer(z.string(), z.number(), val => parseFloat(val));
const numberToBoolean = z.transformer(z.number(), z.boolean(), val => val > 25);
const stringToNumberToBoolean = z.transformer(stringToNumber, numberToBoolean, val => 5 * val);

The problems with transformers

After implementing transformers, I realized transformers could be used to implement another much-requested feature: default values. Consider this:

const stringWithDefault = z.transformer(z.string().optional(), z.string(), val => val || "trout");
stringWithDefault.parse("marlin") // => "marlin"
stringWithDefault.parse(undefined) // => "trout"

Voila, a schema that accepts string | undefined and returns string, substituting the default "trout" if the input is ever undefined.

So I implemented the .default(val:T) method in the ZodType base class as below (partially simplified)

default(def: Input) {
  return ZodTransformer.create(this.optional(), this, (x: any) => {
    return x === undefined ? def : x;
  });
}

Do you see the problem with that? I didn’t. Neither did anyone who read through the Transformers RFC which I left open for comment for a couple months before starting on implementation.

Basically this implementation doesn’t work at all when you use it on transformers.

Side note: should the .default method for stringToNumber accept a number or a string? As implemented above it should accept a string (the Input). But already this is unintuitive to many people.

stringToNumber.default("3.14")

// EQUIVALENT TO
const defaultedStringToNumber = z.transformer(
  stringToNumber.optional(),
  stringToNumber,
  val => val !== undefined ? val : "3.14"
)

defaultedStringToNumber.parse("5")
/* { ZodError: [
  {
    "code": "invalid_type",
    "expected": "string",
    "received": "number",
    "path": [],
    "message": "Expected string, received number"
  }
] */

Let’s walk through why this fails. The input ("5") is first passed into the transformer input (stringToNumber.optional()). This converts the string "5" to the number 5. This is then passed into the transformation function. But wait: val is now number | undefined, but the transformer function needs to return a string. Otherwise, if we pass 5 into stringToNumber.parse it’ll throw. So we need to convert 5 back to "5". That may seem easy in this toy example but it’s not possible in the general case. Zod can’t know how to magically undo the transformation function.

In practice, the current definition of default in ZodType shouldn’t have even been possible. The only reason the type checker didn’t catch this bug is because there are a regrettable number of anys floating around in Zod. It’s not a simple matter to switch them all to unknowns either; I’ve had to use any in several instance to get type inference and certain generic methods to work properly. I’ve tried multiple times to reduce the number of anys but I’ve never managed to crack it.

It’s possible this is a one-off issue. I could find some other way to implement .default() that doesn’t involve transformers. Unfortunately this isn’t even the only problem in Zod’s implementation.

The .transform method

Initially the only way to define transformers was with z.transformer(A, B, func). Eventually I implemented a utility function you can use like this:

z.string().transform(z.number(), val=>parseFloat(val));

 // equivalent to 
z.transformer(z.string(), z.number(), val=>parseFloat(val));

Some users were executing multiple transforms in sequence without changing the actual data type:

z.string()
  .transform(z.string(), (val) => val.toLowerCase())
  .transform(z.string(), (val) => val.trim())
  .transform(z.string(), (val) => val.replace(" ", "_"));

To reduce the boilerplate here, it was recommended that I overload the method definition to support this syntax:

z.string()
  .transform((val) => val.toLowerCase())
  .transform((val) => val.trim())
  .transform((val) => val.replace(" ", "_"));

If the first argument is a function instead of a Zod schema, Zod should assume that the transformation doesn’t transform the type. In other words, z.string().transform((val) => val.trim()) should be equivalent to z.string().transform(z.string(), (val) => val.trim()). Makes sense.

Consider using this method on a transformer:

stringToNumber.transform(/* transformation_func */);

What type signature do you expect for transformation_func?

Most would probably expect (arg: number)=>number. Some would expect (arg: string)=>string. Neither of those are right; it’s (arg: number)=>string. The transformation function expects an input of number (the Output of stringToNumber) and a return type of number (the Input of stringToNumber). This type signature is a natural consequence of a series of logical design decisions, but the end result is dumb. Intuitively, you should be able to append .transform(val => val) to any schema. Unfortunately due to how transformers are implemented, that's not always possible.

More complicated examples

The fact that I incorrectly implemented both .transform and .default isn't even the problem. The problem is that transformers make it difficult to write any generic functions on top of Zod (of which .transform and .default are two examples). Others have encountered similar issues. #199 and #213. are more complicated examples of how the current design of transformers makes it difficult to write any generic functions on top of Zod. Nested transformers in particular are a minefield.

A path forward

When I set out to implement transformers I felt strongly that each transformer should have a strongly defined input and output transformer. This led to me implementing transformers as a separate subclass of ZodType (ZodTransformer) in an attempt to make transformers compose nicely with other schemas. This is the root of the issues I’ve laid out above.

Instead I think Zod should adopt a new approach. For the sake of differentiation I’ll use a new term "mods" instead of "transformations". Each Zod schema has a list of post-parse modification functions (analogous to Yup’s transform chain). When a value is passed into .parse, Zod will type check the value, then pass it through the mod chain.

const schema = z.string()
  .mod(val => val.length)
  .mod(val => val > 100);

type In = z.input<typeof schema> // string
type Out = z.input<typeof schema> // boolean

Unlike before, Zod doesn’t validate the data type between each modification. We’re relying on the TypeScript engine to infer the correct type based on the function definitions. In this sense, Zod is behaving just like I intended; it’s acting as a reliable source of type safety that lets you confidently implement the rest of your application logic — including mods. Re-validating the type between each modification is overkill; TypeScript’s type checker already does that.

Each schema will still have an Input (the inferred type of the schema) and an Output (the output type of the last mod in the mod chain). But because we’re avoiding the weird hierarchy of ZodTransformers everything behaves in a much more intuitive way.

One interesting ramification is that you could interleave mods and refinements. Zod could keep track of the order each mod/refinement was added and execute them all in sequence:

const schema = z.string()
  .mod(val => parseFloat(val))
  .refine(val => val > 25, { message: "Too short" })
  .mod(val => `${val}`)
  .refine(val => /^\d+$/.test(val), { message: "No floats allowed" });

Compatibility

I was using the "mod" terminology above to avoid confusion with the earlier discussion. In reality I would implement the "mod chain" concept using the existing syntax/methods: .default, .transform, etc. In fact I think I could switch over to the "mod" approach without any major API changes.

  • A.transform(func): instead of returning a ZodTransformer, this method would simply append func to the "mod chain"
  • A.transform(B, func): this would return A.transform(func).refine(val => B.safeParse(val).success)
  • z.transformer(A, B, func): this could just return A.transform(func).refine(val => B.safeParse(val).success)
  • A.default(defaultValue): this is trickier but still possible. This function would instantiate A.optional().mod(val => typeof val !== "undefined" ? val : defaultValue). Then all the mods of A would be transferred over to the newly created schema

Under the hood things would be working very differently but most projects could upgrade painlessly unless they explicitly use the ZodTransformer class in some way (which isn’t common).

I would still consider this to be a breaking change of course. If/when I make these changes, I plan to publish them as Zod v3. In this scenario Zod 2 would never leave beta, we’d jump straight to v3.

This transformers issue has caused me a lot of grief and headaches but I’m confident in the new direction; in fact I already have most of it working. I want to put this out for feedback from the community. The issues I’m describing are pretty subtle and not many people have run into them, but I believe the current implementation is currently untenable.

@o-alexandrov
Copy link
Contributor

o-alexandrov commented Dec 9, 2020

The only regret I have about zod is:

  • type-related methods in classes, for example of z.number()'s:
    • min, max, etc.

The problem with them is:

  • if unused, they cannot be removed as part of the dead code removal (for example w/ terser) because it's possible class methods are dynamically requested

To shake off a lot of dead code in production, zod could:

  • remove all type-related methods like min, max etc
  • export them as ready-to-use either transform's or refine's functions
    • you could still protect the user from unwanted use of min for example for z.object(), by assigning proper typings, for example:
      • z.ZodTransformer<z.ZodNumber, z.ZodNumber>

@colinhacks colinhacks changed the title RFC: Removing transformers RFC: Refactoring transformers Dec 9, 2020
@colinhacks
Copy link
Owner Author

That should be it's own issue, it's not really related to this discussion

@o-alexandrov
Copy link
Contributor

Sorry, I thought it could be part of the transformers change.
I'll keep an eye on this issue then and will try to express the thoughts above clearer once this issue is merged

@colinhacks
Copy link
Owner Author

This is just a proposal at this stage, there may be a good reason not to do this that I'm not aware of.

I understand what you're proposing above and I'm against switching everything to functions, mostly because you lose autocomplete. I don't want to degrade the developer experience for everyone to appease a tiny minority who care about a couple extra kb in their bundle.

@colinhacks colinhacks changed the title RFC: Refactoring transformers RFC: Refactoring transformers and Zod 3 Dec 10, 2020
@colinhacks
Copy link
Owner Author

@o-alexandrov I just published a v3 branch with the new transformers implementation. There aren't many changes outside of ZodTransformer. I'd like to publish the v3 beta once we have ESM working.

@carlpaten
Copy link
Collaborator

To provide a data point: when I saw that Zod v2 had transformers I expected them to work pretty much exactly how you describe here, and I was confused when they didn't.

This proposal sounds grand to me. The only thing I'd like to flag is that it sounds like you're going to be implementing multiple transforms as lists of functions. The alternative would be to use function composition, which I suspect would be simpler and more performant. Have you considered that approach?

@jstewmon
Copy link

👋 Hi, was trying out the proposed interface in 3.0.0-alpha.2, and found a couple of issues using a simple string to Date transform...

A.transform(B, func): this would return A.transform(func).refine(val => B.safeParse(val).success)

The latter works, but the former results in an error:

> z.string().transform(d => new Date(d)).refine(d => z.date().safeParse(d).success).parse('2020-12-30T22:01:06.839Z')
2020-12-30T22:01:06.839Z

> z.string().transform(z.date(), d => new Date(d)).parse('2020-12-30T22:01:06.839Z')
Uncaught TypeError: effect.mod is not a function
    at ZodTransformer.parse (/home/node/app/node_modules/.pnpm/zod@3.0.0-alpha.2_@types+node@14.14.17/node_modules/zod/lib/cjs/parser.js:860:41)

z.transformer(A, B, func): this could just return A.transform(func).refine(val => B.safeParse(val).success)

The former returns the result of A.parse:

> z.transformer(z.string(), z.date(), d => new Date(d)).parse('2020-12-30T22:01:06.839Z')
'2020-12-30T22:01:06.839Z'

@colinhacks
Copy link
Owner Author

Sorry I haven't implemented the A.transform(B, func) and z.transformer syntax yet. I think I might just get rid of those syntax options entirely after all, they're too confusing.

@jstewmon
Copy link

jstewmon commented Jan 3, 2021

The refine interface is a bit awkward to work with following a transformation because there doesn't seem to be a way to leverage the fidelity of schemas thereafter.

Here's an example of a transform for parsing duration strings:

const duration = z
  .string()
  .transform(ms)
  .refine((val) => z.number().safeParse(val).success, {
    message: "invalid duration",
  });

The first thing I noticed is that I couldn't figure out how to use the existing options from z.number(). Without providing options, the ZodError will have { code: 'custom', message: 'Invalid value.'}

If I want to further refine the schema (e.g. range constraints), I can't use .min(x).max(y) - I have to wrap each one with refine and provide the options.

Would it be possible to pass a ZodType instance to refine, so that the return value of refine could be a ZodType that can be leveraged so that we can continue chaining refinements?

For example:

const duration = z
  .string()
  .transform(ms)
  .refine((z.number());

const durationRange = duration.positive().max(30_000);

@colinhacks
Copy link
Owner Author

It's an interesting idea but I think it would get confusing fast. durationRange would be a ZodNumber instance that only accepts strings...which I don't like.

Also you can use the ZodNumber methods like this:

const duration = z
  .string()
  .transform(ms)
  .refine((val) => z.number().positive().max(30_000).safeParse(val).success, {
    message: "invalid duration",
  });

It's not quite what you're looking for but it makes everything more explicit.

To make this less verbose I could introduce a .refineWithSchema method that you use like this:

const duration = z
  .string()
  .transform(ms)
  .refineWithSchema(z.number().positive().max(30_000));

It's basically syntactic sugar over your .safeParse(val).success approach except it would automatically catch any ZodError thrown during parsing and return that, instead of throwing the generic { code: 'custom', message: 'Invalid value.'} error.

@jstewmon
Copy link

jstewmon commented Jan 4, 2021

durationRange would be a ZodNumber instance that only accepts strings...which I don't like.

This seems like an incidental detail to me. Is the instance type significant to calling code?

I have been reasoning about a schema as something that takes an input and returns a result of some type iff the input meets all the validation criteria. Knowing the type of the input isn't necessarily useful, since often comes from deserializing an input.

Having read back over the examples in the proposal, I think the problem isn't with the interface / behavior of transform, though there is probably room for improvement.

IIUC, the existing transform method properly supports chaining / pipelining / composition of schemas, but the current proposal does not offer a substitute.

Let's break down the original problem case:

const stringToNumber = z.transformer(
  z.string(),
  z.number(),
  val => parseFloat(val)
);

stringToNumber.default("3.14")

const defaultedStringToNumber = z.transformer(
  stringToNumber.optional(),
  stringToNumber,
  val => val !== undefined ? val : "3.14"
)

stringToNumber is a parse pipeline that takes a string and returns a number because z.string() is the head and z.number() is the tail.

I actually would have expected stringToNumber.default("3.14") to produce a TS error b/c the parameter type should be number. This seems like an opportunity for improvement.

Still, if we recognize that transform is roughly equivalent to pipe, we can reason that specifying a default must be done before a value is required, which will typically be at the head of the pipeline:

z
  .string()
  .default("3.14")
  .transform(stringToNumber, val => val)
  .parse(undefined); // 3.14

We were able to reuse our stringToNumber schema through composition with transform.

If parsing a value is expensive, so we'd prefer to provide a default for the tail instead of the head, we can do that with union and transform:

const shortCircuit = z.union([
  z.literal(undefined).transform(z.number(), x => 3.14),
  stringToNumber
]);
shortCircuit.parse(undefined); // 3.14
shortCircuit.parse("10"); // 10

The composition facilities of transform create a lot of leverage that is missing from the proposal.

Going through these examples, the only significant fault I found with transform is that the return type doesn't match the tail schema.

I think having an accurate transform return type and clarifying docs would be sufficient to avoid confusion while retaining the power of the existing functionality.

@jstewmon
Copy link

jstewmon commented Jan 4, 2021

Going through these examples, the only significant fault I found with transform is that the return type doesn't match the tail schema.

Er... I forgot to try my original use case:

stringToNumber.positive().max(5).parse("10")
// Property 'positive' does not exist on type 'ZodTransformer<ZodString, ZodNumber>'

That doesn't work, presumably for the same reason why default doesn't have the signature I expected.

I actually would have expected stringToNumber.default("3.14") to produce a TS error b/c the parameter type should be number

Correction, I think the type default parameter type should be never, since z.number() cannot use a default value.

@carlpaten
Copy link
Collaborator

carlpaten commented Jan 4, 2021

[Content status: strong opinions held weakly; not passing value judgments, just offering suggestions]

I have been reasoning about a schema as something that takes an input and returns a result of some type iff the input meets all the validation criteria. Knowing the type of the input isn't necessarily useful, since often comes from deserializing an input.

To concur: IMO, a parsing library is (morally) a toolbox for building functions of type A => B | ZodError, with the contents of ZodError reflecting A's structure. In a perfect world you'd want the library to permit composition of A => B | ZodError with B => C | ZodError (I believe this is called a "functorial" interface in functional programming). The hard part is keeping ZodError legible under a variety of potential combinations, and indeed that severely restricts the space of practical solutions.

If I understand correctly, @colinhacks proposes factoring A => B | ZodError into A => A | ZodError and A => B. @jstewmon would like to be able to compose this with a B => B | ZodError and possibly a B => C | ZodError. The thing is that unless you can somehow provide a reverse mapping from downstream validation errors to the original value, this will net you utterly incomprehensible error messages!

The question of how to reverse-map downstream errors into a form that reflects the original input is an interesting one, but I don't think it should be a blocker for work on transformers. Just surfacing "validate, then transform" solves a ton of real-world use cases. We can talk about performing validation on the transformed values in a later proposal - IMO in the vast majority of cases it's going to be an anti-pattern because it will completely wreck your error messages and perform likely unnecessary work on invalid inputs.

@colinhacks
Copy link
Owner Author

colinhacks commented Jan 4, 2021

Beautifully put. The simplicity and intuitiveness of "validate, then transform" is a big part of the appeal. It also harmonizes nicely with how Zod is generally used: as a source of typesafety that lets you confidently access/manipulate data later.

I hadn't considered the problem of corresponding error messages to data structure 😬 Great point. I'm surprised no one's raised this issue since Zod 2 has been in beta.

That concept is relevant to another idea I mentioned above: the idea of catching ZodErrors inside refinements and surfacing it to the user (currently Zod intentionally doesn't catch errors that occur inside refinement functions). But since refinements and transformations can be interleaved under this proposal, there's no guarantee that the structure of the data in a refinement bears any resemblance to the input data structure. So I'm probably not doing to do that.

@jstewmon composability is still possible with the new proposal, but you do it with functions instead of schemas. I think this is an acceptable if not preferable approach.

const stringToNumber  = (val:string)=>parseFloat(val);

z
  .string()
  .default("3.14")
  .transform(stringToNumber)
  .parse(undefined); // 3.14

You could compose pipelines of transformation functions with existing FP tools externally to Zod as well.

@carlpaten
Copy link
Collaborator

carlpaten commented Jan 4, 2021

I hadn't considered the problem of corresponding error messages to data structure 😬 Great point.

My original writeup described the ideal error type as "an indexed type family that reifies the structure of A." But I didn't want to derail. It would be an interesting conversation to have (elsewhere); personally I don't know enough about how people actually use error messages to confidently make any kind of concrete proposal.

@jstewmon
Copy link

jstewmon commented Jan 5, 2021

@LilRed thanks for the excellent clarifying summary.

I think I agree in principle with the "validate, then transform" approach, but there are very common cases with inputs like durations and timestamps (e.g. ISO 8601), which require a transformation to be performed to determine validity.

Reflecting on my original feedback, my desire was leverage all of the existing ZodType checks and error messages as a matter of convenience for these cases where the input must be transformed before its validity can be determined.

I think the result of doing so is what @LilRed referred to as "utterly incomprehensible error messages" 😅 , since, in my duration example, the input is required to be a string, but if the string isn't a valid duration, the error might be:

Error: [
  {
    "code": "invalid_type",
    "expected": "number",
    "received": "string",
    "path": [],
    "message": "Expected number, received string"
  }
]

However, I wouldn't call this incomprehensible, I would call it subtle. Whether or not the result is comprehensible depends on whether or not the result allows the failed check to be disambiguated, which is possible through inference in many cases without being explicit. (Not to argue that being inferential is better than being explicit).

@colinhacks you probably already thought of this, but for the sake of being complete, I want to try to elaborate on what I see as the shortcoming of using refine...

Taking this example:

const duration = z
  .string()
  .transform(ms)
  .refine((val) => z.number().positive().max(30_000).safeParse(val).success, {
    message: "invalid duration",
  });

If any of the refine checks fail, we won't know which one because there's no way to percolate the error details.

Using refine, we have to call refine with a predicate and options for every check we want to discretely identify. I think this shortcoming would be addressed by your previous suggestion to have a complementary refineWithSchema method, but your last comment sounds like you've reconsidered:

But since refinements and transformations can be interleaved under this proposal, there's no guarantee that the structure of the data in a refinement bears any resemblance to the input data structure. So I'm probably not doing to do that.

I suppose you mean that, philosophically, performing validation after a transform shouldn't be supported. But, if refine can be used after transform, why not refine with a schema?

@carlpaten
Copy link
Collaborator

carlpaten commented Jan 5, 2021

After some thinking and discussion here are the three families of solutions I can think of, each with various pitfalls. The last one is my favorite. All code is pseudo-Zod because I don't remember the method signatures off the top of my head.


The key issue with interleaving transformation and validation steps is that validation errors that come after one or more transformations are difficult to contextualize. Toy example:

z.string()
  .transform(s => s.slice(4))
  .refine(s => s.length > 5, "expected string of length greater than 5")
  .parse('foobar');

// Throws: {msg: "expected string of length greater than 5"}

This will throw expected string of length greater than 5 even though the input has length 6. I'm not sure whether that's often a problem in practice; in particular, for form validation there's just no way this will come up. I'm still concerned though, if this is what's settled on I would advise my team to avoid putting validation steps after transformation steps.


If we really want to support interleaved transformation and validation steps without sacrificing error clarity, we can draw inspiration from stack traces and have every operation provide a contextualization string.

z.string()
  .transform(s => s.slice(4), "strip first four characters")
  .refine(s => s.length > 5, "expected string of length greater than 5")
  .parse("foobar");

// Throws: { context: ["strip first four characters"], msg: "expected string of length greater than 5" }

With this approach transformation and validation steps can be interleaved to arbitrary depth while still retaining enough information to figure out what happened.

I can't say I would personally use this solution if it were available; I suspect it's overkill for almost every use case.


Instead of supporting arbitrary interleaving, we could go the opposite way: no interleaving at all! You can have a chain of validation steps followed by a chain of transformation steps. This means that every step that could fail is operating on the original input, so there is no need for contextualizing errors.

However, this sucks for the date parsing use case that @jstewmon mentions:

z.string()
  .refine(s => try { myDateLib.parse(s); return true} catch { return false }, "malformed date")
  .map(s => myDateLib.parse(s));

Duplication of work, both in code and at runtime. Ew!


Maybe in completely rejecting interleaving we went too far. Instead, we could go with the following design: a chain of validation steps, followed by ONE fallible transformation step, followed by a chain of infallible transformation steps. This still has the property that every fallible step is operating on the original input, so there is no need to contextualize errors.

This is my favored solution, so let me elaborate on it a bit by sketching out some types (again pseudo-Zod):

type ZodResult<T> = { success: true, value: T } | { success: false, errors: ZodError[] }

// Instances of ZodValidator include z.number(), z.string(), etc.
interface ZodValidator<T> {
  refine(predicate: (v: T) => boolean, msg: string): ZodValidator<T>;
  // More powerful alternative to refine?
  refineAlternative(predicate: (v: T) => ZodError | undefined): ZodValidator<T>;

  transform<U>(f: (v: T) => ZodResult<U>): ZodTransformer<T, U>;

  map<U>(f: (v: T) => U): ZodTransformer<T, U>;

  parse(v: T): U;
  safeParse(v: T): ZodResult<U>;
}

type ZodTransformer<T, U> = {
  map<V>(f: (v: U) => V): ZodTransformer<T, V>
  parse(v: T): U;
  safeParse(v: T): ZodResult<U>;
}

Usage:

z.transform((s) => {
  try {
    return {value: myDateLib.parse(s), success: true};
  } catch {
    return {error: 'input string not a date', success: false};
  }
});

@colinhacks
Copy link
Owner Author

colinhacks commented Jan 6, 2021

I suppose you mean that, philosophically, performing validation after a transform shouldn't be supported. But, if refine can be used after transform, why not refine with a schema?

You're right that the logical conclusion of that train of thought is disallowing post-transform refinements — something I'm not willing to do.

There is an important difference though between post-transform refinements with .refine vs .refineWithSchema. Namely that refineWithSchema has the potential to provide "structured" errors (e.g. with a non-empty path) whereas .refine returns a "top-level" error (unless you pass in a path param). In this way refinements sort of side-step the issue of provided error paths that don't reflect the original input.

As for the "incomprehensible vs subtle" debate...I'm not sure what I think.

@LilRed This is a great summary of the options here. Your proposal for refineAlternative sort of exists — it's currently named _refinement since its only used internally. It uses the ctx.addIssue approach I described in our call. (The return type is irrelevant/ignored.) Check out the implementation of types/base.ts to see it in action.

@jstewmon
Copy link

jstewmon commented Jan 6, 2021

If we really want to support interleaved transformation and validation steps without sacrificing error clarity, we can draw inspiration from stack traces and have every operation provide a contextualization string.

I find this to be the most appealing because I think is a neat generalization that can satisfy the most use cases. I sketched out some more complex scenarios to reason about how the context might be populated, and I ended up finding that I think having a single path property works best with this approach:

const csvInput = "1,2,E,4";
const csvInts = z
  .string()
  .transform(z.array(z.string()), val => val.split(','), "split")
  .transform(z.array(z.number()), val => val.map(i => parseInt(i)), "map");
// ZodError { path: [{ xform: "split", path: [] }, { xform: "map", path: [{ property: 2 }] }] }

const csvIntsAlt = z
  .string()
  .transform(
    z.array(
      z.string().transform(z.number(), val => parseInt(val), "parseInt")
    ),
    val => val.split(','),
    "split"
  );
// ZodError { path: [{ xform: "split", path: [{ xform: "parseInt", path: [{ property: 2 }] }] }] }

const embedJSONInput = {
  name: "foo",
  policy: JSON.stringify({ effect: "allow", principal: "jstewmon", resource: "*" })
}

const embedJSON = z
  .object({
    name: z.string(),
    policy: z.string().transform(
      z.object({
        effect: z.enum(["allow", "deny"]),
        principal: z.string(),
        resource: z.string().refine(val => val !== "*", { message: "no wildcards" })
      }),
      JSON.parse,
      "parse json"
    )
  });
// ZodError { path: [{ property: "policy", path: [{ xform: "parse json", path: ["resource"] }] }] }

Note, that this would be a breaking change to the path property - the way I see it being populated is that sequences represent transform sequences and nesting represents property / schema nesting. If this is something that would be harder to handle than the current path structure in many cases, the more advanced structure could be called context and be in addition to path.

@jstewmon
Copy link

jstewmon commented Jan 6, 2021

@LilRed @colinhacks I wanted to take a step back and say that I think this has been a productive discussion, but I worry that I've been playing the role of antagonist in it, possibly to the detriment of committing to a satisfactory solution and making progress towards the next major release.

I know it can be difficult to actually gather feedback from users, so I've been happy to try out the proposed implementation, provide feedback on it, and consider alternatives.

My intent isn't to disengage from the discussion, but I think I've adequately expressed my feedback on this RFC, and I wouldn't characterize my position on it as dissenting. After all, most of the examples we've used are contrived, and the practical differences for users border on pedantic.

If you're satisfied with the mechanics of the original proposal and want to move forward with it, then you have my support. Maybe some of the ideas we've discussed make their way into the library in the future if needed.

If you want to continue deliberating this RFC, I'll keep replying. 😄

Thanks for the amazing library 🙏

@akutruff
Copy link
Contributor

akutruff commented Jan 28, 2021

Please tell me if I'm missing something here. Doesn't this mean we can't introspect the shape of transformed types? If I do a merge() on two types I can still examine their shape.

I'm using the library for both parsing and for runtime schemas. I want to be able to introspect transformed types which is why I started to use this library in the first place. :(

For example, I want to write an ORM function that replaces every property of an object that has an id property with a string. This somewhat straightforward in pure TypeScript. I can create the transform below, but then I do not have access to the shape of the type. This means my dispatch code will not work, as it needs to reflect on the shape of the transformed schema.

Jump to the bottom of the code example first to see what I mean.

   const Person = z.object({
      id: z.string(),
      name: z.string(),
      age: z.number(),
  });
  type Person = z.infer<typeof Person>;

  const Address = z.object({
      id: z.string(),
      street: z.string(),
      owner: Person
  });
  type Address = z.infer<typeof Address>;

  interface HasPropertiesOfType<PropTypes> {
      [index: string]: PropTypes
  }

  type MapPropertiesOfTypeToNewType<T, PropTypes, NewType> = T extends HasPropertiesOfType<PropTypes> ? {
      [P in keyof T]: NewType
  } : never;

  type PickAndMapPropertiesOfTypeToNewType<T, PropTypes, NewType> = MapPropertiesOfTypeToNewType<FilterProps<T, PropTypes>, PropTypes, NewType>;

  type PropsOfType<T, TPropTypes> = {
      [K in keyof T]: T[K] extends TPropTypes ? K : never;
  }[keyof T];

  type FilterProps<T, TPropTypes> = Pick<T, PropsOfType<T, TPropTypes>>;

  type ID = string;
  interface Identified {
      id: ID
  }

  type RelationalRep<T> = PickAndMapPropertiesOfTypeToNewType<T, Identified, ID>;

  type AnyObject = Record<string, unknown>;

  function isTypeIfPropDefined<T>(obj: Partial<T>, keyGetter: (x: typeof obj) => T[keyof T] | undefined): obj is T {
      return keyGetter(obj) !== undefined;
  }

  const transformToRelational = <T extends AnyObject>(value: T) => {
      const relational = {} as any;

      //can use the actual type Address type to pre-reflect and memoize the list of properties with id properties.

      for (const prop of Object.keys(value)) {
          const propValue = Reflect.get(value, prop);
          if (isTypeIfPropDefined<Identified>(propValue, x => x.id)) {
              relational[prop] = propValue.id;
          }
      }
      const result: RelationalRep<typeof value> = relational;
      return result;
  };

  const AddressRelations = Address.transform(transformToRelational);
  type AddressRelational = z.infer<typeof AddressRelations>;

 
        // Resolves to
        //
        // type AddressRelational = {
        //     owner: string;
        // }
    
        AddressRelations.shape  //does not exist :(

 

@carlpaten
Copy link
Collaborator

carlpaten commented Jan 29, 2021

@akutruff this is an interesting example, and I learned a bunch of stuff from dissecting it. In order to facilitate discussion I've pared it down as much as I could while still preserving the essentials:

import * as z from "zod";

const Person = z.object({
  id: z.string(),
  name: z.string(),
});
type Person = z.infer<typeof Person>;

const Address = z.object({
  id: z.string(),
  street: z.string(),
  owner: Person,
});
type Address = z.infer<typeof Address>;

// Returns the foreign keys on the address object
const getRelationsOnAddress = (address: Address): { owner: string } => {
  const relations = {} as any;

  for (const prop of Object.keys(address)) {
    const propValue = Reflect.get(address, prop);
    if ('id' in propValue) {
      relations[prop] = propValue.id;
    }
  }
  const result: { owner: string } = relations;
  return result;
};

const AddressRelations = Address.transform(getRelationsOnAddress);
AddressRelations.shape; //does not exist :(

Recall that .shape is a way of introspecting object schemas. So for example given the above Address.shape would be an object collecting schemas for id, street, and owner. The type of .shape is informative as to its contents.

It is not likely to be possible to implement .shape for arbitrary transform-ed types. Specifically, deriving a schema for an arbitrary subfield of an arbitrarily transformed schema is impossible, as there would . You would be better served by rewriting transformToRelational to operate directly on Zod schemas. (If you decide to do this I would love to see the results!)

This means my dispatch code will not work, as it needs to reflect on the shape of the transformed schema.

Would you mind elaborating on this point?

@akutruff
Copy link
Contributor

Thanks for taking the time to wade through it!

You hit the nail on the head - I totally want to work in the Schema domain. Just like a merge() produces a new schema. I tried taking a crack at writing something like the following, but the generics weren't working out.

function makeRelational(myType: ZodType<Shape>)  :  ZodType<Shape>
{
/*...*
}

I tried to copy paste the merge() method into my own code and and convert it to a pure function. I couldn't get my head around the necessary generics for declaring a parameter as a ZodType and some of the generics used by merge() were not exposed. I saw string literal types like 'strip' that I assume are there to help with type inference and then quickly bailed on the endeavor since that seems to be intended as internal implementation subject to change. My takeaway was that the library is designed to be fluent and type manipulations/operations leverage the typing that comes with that fluency.

This means my dispatch code will not work, as it needs to reflect on the shape of the transformed schema.

Would you mind elaborating on this point?

Rereading what I wrote, I realize that I shouldn't have mixed that one sentence into the rest of it. Basically, once I have a parsed domain object, I'm going to shuttle along the actual ZodType in some headers. The shape will be used to display UI for object field editing, etc.

@carlpaten
Copy link
Collaborator

Since I wrote my last comment I played around with Zod internals, and I've come to the conclusion that the current architecture does not in fact support schema-domain computation well! I've opened #307 to further discuss.

@VanCoding
Copy link

I know I'm late to the party, but there's one (at least to me) very important topic that didn't really get attention yet: Two-Way transformations

Since I'm using zod (and plan to use it even more) for transferring data between two systems in a typesafe way, zod is usually involved in two steps:

  • serialization
  • deserialization

Or, like this:

client -> serialize -> network -> deserialize -> server
client <- deserialize <- network <- serialize <- server

Currently, we only ever talked about deserialization, or the parse(...) method. But shouldn't there also be a serialize(...) method, that basically reverses the transformation? At least in my opinion, a transformer should always consist of a serialization and deserialization function.

If I understand it correctly, that's what a Codec in io-ts does.

I know that zod-graphql currently is only a placeholder. But considering that such thing should become a reality at some point, there also needs to be a solution for implementing scalars. And scalars are also bi-directional.

@Brodzko
Copy link

Brodzko commented Mar 18, 2021

I know I'm late to the party, but there's one (at least to me) very important topic that didn't really get attention yet: Two-Way transformations

I second this! Currently working on making this work with two one-way schemas, but this would be a huge help.

On a related note, while working on this I noticed ZodTransformer is not extendable. Is this potentially something that could be implemented?

My use-case is adding/removing fields to/from "API"/"client" schemas post parsing, in a scenario @VanCoding mentioned above. But since most fields in both schemas don't really need transforming (the most obvious use-case would be date-strings/Date objects), I was thinking to define just one of them and get the other one by extending and overriding.

Both of these things work on their own, but I haven't found a way to combine them. Thoughts?

@colinhacks
Copy link
Owner Author

Often transforms aren't reversible, so requiring an "inverse transform" in all cases limits you to toy examples - scalars that can be converted into each other without loss of information. Even something simple like z.string().transform(val => val.toUppercase()) isn't reversible. People use transforms for all sorts of things like that.

In general I think defining a base object schema, then extending the base with one-way transformers is the way to go. I understand it's not optimal DX.

scalars are also bi-directional

@VanCoding what do you mean by that?

ZodTransformer is not extendable

@Brodzko what do you mean by that?

@Brodzko
Copy link

Brodzko commented Mar 19, 2021

Often transforms aren't reversible, so requiring an "inverse transform" in all cases limits you to toy examples - scalars that can be converted into each other without loss of information. Even something simple like z.string().transform(val => val.toUppercase()) isn't reversible. People use transforms for all sorts of things like that.

Yeah that's a good point. I guess we'd have to define both to and from transform separately and I could see how that goes beyond the intended scope of this lib. I guess io-ts would currently be better suited for a case like this?

In general I think defining a base object schema, then extending the base with one-way transformers is the way to go. I understand it's not optimal DX.

Yeah, fairly verbose :/

ZodTransformer is not extendable

@Brodzko what do you mean by that?

Sorry, I now realise that was pretty ambiguous. I mean I cannot do this:

// additionalPropForClient would be useful to have in the model but it doesn't come from the API
const FromApiSchema = z.object({
  prop1: z.string(),
  prop2: z.boolean(),
}).transform(val => ({ ...val, additionalPropForClient: basedOnCondition(val.prop1, val.prop2) });

const FromClientSchema = FromApiSchema.extend(...)

I guess the problem would be to reverse the final .transform on FromApiSchema (delete additionalPropForClient in this case).

Now that I think about it, there also isn't a good way to enforce that z.input<typeof FromApiSchema> corresponds to z.output<typeof FromClientSchema> and vice versa if that ever were necessary. But I digress.

@VanCoding
Copy link

scalars are also bi-directional

@VanCoding what do you mean by that?

I don't know if that was the correct terminology, but in the context of graphql, there are "scalars", which are decoded when received by the server and encoded again when sent to the client. And that wouldn't be currently possible with zod.

In general I think defining a base object schema, then extending the base with one-way transformers is the way to go. I understand it's not optimal DX.

Especially with deeply nested structures, yes. But since encoding/decoding is not the main focus of this library I can somewhat understand your position. Maybe io-ts is the better fit for such usecases then. It's just that i like the overall API of this library sooooo much :D

@carlpaten
Copy link
Collaborator

@VanCoding out of curiosity, have you tried implementing bidirectional transformers as an extension to the latest Zod v3 pre-release? @colinhacks just made some changes that should help with defining user-side abstractions on top of Zod. If you give this a shot I would love to hear back.

@hmaurer
Copy link

hmaurer commented Apr 12, 2021

I am also interesting in bidirectional transformers; started using Zod recently on a project and it feels like a missing piece. Would be curious to hear if there are are plans as part of core Zod, or whether someone has implemented them in userland!

@colinhacks
Copy link
Owner Author

colinhacks commented Apr 25, 2021

As @LilRed alluded to, anyone can now simply subclass ZodType to implement custom schema types. You could theoretically implement a subclass that accepts two schemas (A and B), an A->B mapper, and a B->A mapper, and implements new encodeanddecode` methods. Though this won't be ideal, since you really need the base class (ZodType) to provide serialize/deserialize methods and have the ability to drop codecs into compound types like ZodObject...

@mmkal
Copy link
Contributor

mmkal commented Aug 13, 2021

@colinhacks some examples of reversible, non-toy transforms:

date: string -> validate-as-iso-string -> Date
(reversed: Date -> .toISOString()`)

string -> base64-decode -> JSON.parse -> { date: -> date as above }
(reversed: { date: date } -> JSON.stringify -> base64-encode)

I've used both of those with a hideous io-ts abstraction in the past, so that you can create AWS Kinesis record parsers really nice and easily, and they hydrate events like magic:

export const UserRecords = kinesisEvent(t.type({ firstName: t.string, lastName: t.string })

export const handler = async (_ev: unknown) => {
  const event = UserRecords.decode(_ev)
  if (ev._tag === 'Left') throw new TypeError('...')

  return ev.Records.map(r => r.firstName)
}

The same type can then provide strong types for creating records too:

UserRecords.encode({
  Records: [{ firstName: 'Abc', lastName: 'def' }],
})

Pressed send earlier than I mean to... I meant to say, I'm not actually sure what the right balance is. I moved away from io-ts and I don't miss it, because it feels too much like doing maths homework to be very productive in. Implementing that kinesisEvent helper above was pretty hard and my team were scratching their heads when they needed to review it. I'm doing this for my job, so zod is much, much better. But being able to do it was very useful! And fits well into the parse, don't validate philosophy.

@whitebyte
Copy link
Contributor

I would like to bring attention to this issue: #586
As far as I understand, since v3 all parsing happens before refinements, rendering a simple task of collecting all errors (both from the parsing step and refinements) impossible. There are legit use cases when one needs it (e.g. HTML schema submission as mentioned in #586).

Also, https://github.com/colinhacks/zod/blob/master/ERROR_HANDLING.md is misleading (not updated after the v3 release?). Consider this example: https://codesandbox.io/s/distracted-hooks-ehisd?file=/src/index.ts
The code is copy-pasted from the document, but the error output is different. Namely, the refine error message is missing.

While I understand the motivation for this change, there should be a way to get the full list of errors. Currently I can't come up with any clean, non-hackish way to do so.

@scotttrinh
Copy link
Collaborator

@whitebyte

Definitely understand your perspective and the pain felt there. I've made the comment in other contexts that I think it makes sense to write your parser in a way that actually reflects your input type, so if your input type is a form, you might consider writing a very stringy/permissive/optional-y schema with individual refinements, or lean on preprocess to get your loose data into a more structured format for the schema. Might be worth writing a guide on the current best practices around form validation.

Also, https://github.com/colinhacks/zod/blob/master/ERROR_HANDLING.md is misleading (not updated after the v3 release?).

Thanks for calling this out. We can update the docs to reflect the fact that parsing occurs before refinement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests