Skip to content

Defining New Schema Types 1.0

Jason Wolfe edited this page Jan 13, 2016 · 6 revisions

In this section, we will see the steps required to define new schemas.

The Schema protocol specifies two functions spec and explain. The spec function returns a declarative representation of the schema, and the explain is used for human-readable representations of the schema. There are 3 main types of schema specs: leaf, variant, and collection. Leaf schemas are atomic, they do not depend on other schemas; and both variant and collection schemas are ultimately composed of leaf schemas.

Variant and collection schemas define the two alternate ways to compose schemas. In short, think of variant and collection schemas respectively as sum and product datatypes.

Variant schemas are defined by a sequence of mutually-exclusive options with guards; when matching against data, the schema checker finds the first guard that succeeds, and then checks the datum against the corresponding schema.

On the other hand, collection schemas specify how to construct a complex schema out of smaller, constituent schemas where the overall schema matches structured data and the constituent schemas match subparts of the data.

Schemas can be recursive too, so you can write schemas that are defined in terms of themselves. Let's take a look at how to define leaf, variant, and collection schemas.

Leaf Schemas

A leaf schema is one that is defined without depending on any other schema. Each leaf schema is an instance of LeafSpec in leaf.cljx. LeafSpec has a single field which is the precondition. A precondition is a function of a value that returns a ValidationError if the value does not satisfy the precondition, and otherwise returns nil. A precondition is essentially a very simple checker.

As an example, let's take a look at EqSchema implemented in schema/core.cljx. This schema is used to check whether some input data value x is equal to a given value v.

Let's see how this schema can be created and used for checking. For example, we can define a schema (eq "Schemas are cool!") that can be used to check whether a particular value is exactly equal to the string "Schemas are cool!". Let's first try with a positive example:

(require '[schema.core :as s])

(s/check (s/eq "Schemas are cool!") "Schemas are cool!")
> nil

The schema check succeeds (it returns nil) because the two strings are equal.

Now let's try a negative example:

(s/check (s/eq "Schemas are cool!") "Schemas are NOT cool!")
> (not (= "Schemas are cool!" "Schemas are NOT cool!"))

Here the schema check fails because the data value "Schemas are NOT cool!" does not match the value given when defining the EqSchema (namely "Schemas are cool!"). In this case, we see that the result is a validation error message explaining how the schema failed to validate.

Now that we have seen how EqSchema is used, let's see how it is implemented.

(defrecord EqSchema [v]
  Schema
  (spec [this] (leaf/leaf-spec (spec/precondition this #(= v %) #(list '= v %))))
  (explain [this] (list 'eq v)))

EqSchema implements the Schema protocol and checks that the input data value is equal to v. The spec method of EqSchema returns a leaf-spec constructed of a precondition that uses an anonymous function to check equality of the argument against v: #(= v %). If the precondition doesn't pass, a ValidationError is constructed that describes why: #(list '= v %), the input value doesn't match v. (For other leaf schemas, there is a helper macro for really straightforward preconditions: simple-precondition in spec.cljx that takes the schema and a predicate function, like even?, and does the reasonable thing.)

There are many examples of leaf schemas, such as: s/Num for matching numbers and s/pred for defining schemas that check an arbitrary predicate.

Leaf schemas are small and simple and serve as the building blocks for bigger, more complex schemas.

Variant Schemas

Variant schemas provide a way to define a single schema that matches against a collection of different variants, called options. Each option specifies a schema, and also specifies a guard and optionally an error-wrap function. The guards are used to select which option to apply for a particular datum: each guard is a predicate that is called on the input datum, and the first option whose guard returns true is selected to fully validate the datum. (The guard is optional for the final option).

As an example of a variant schema, let's look at the ConditionalSchema defined in schema/core.cljx. This schema is used to specify a set of variant schemas where the variant is chosen based on properties of the input data.

Let's use a simple example to see how this powerful schema can be used for checking. We're going to define a schema that can check against one of many possible shapes: square, rectangle, circle.

(s/defschema Shape
  "A schema for shapes: squares, rectangles, and circles"
  (s/conditional
    #(= (:type %) :square) {:type (s/eq :square) :side s/Num}
    #(= (:type %) :rectangle) {:type (s/eq :rectangle) :width s/Num :height s/Num}
    #(= (:type %) :circle) {:type (s/eq :circle) :radius s/Num}))

Each Shape is a map with a :type field. The other fields in the shape map are determined by the value of :type: :squares have a :side, :rectangles have a :width and :height, and :circles have a :radius.

The implementation of the ConditionalSchema is about as pure an example of a variant schema as possible:

(defrecord ConditionalSchema [preds-and-schemas]
  Schema
  (spec [this]
    (variant/variant-spec
     spec/+no-precondition+
     (for [[p s] preds-and-schemas]
       {:guard p :schema s}) ;; specify the guard and schema for each variant
     #(list 'matches-some-condition? %)))
  (explain [this]
    (->> preds-and-schemas
         (mapcat (fn [[pred schema]] [pred (explain schema)]))
         (cons 'conditional))))

The spec returned is a variant spec, where the guard and internal schemas for all the variants are taken directly from the predicate and corresponding schema for the different branches in the conditional. Other examples of variant schemas are the Maybe and Recursive schemas in schema/core.cljx.

Collection Schemas

The last type of schema is the collection schema, which is used for describing composite data types that are composed of leaf schemas, variant schemas, or even other collection schemas.

Collection schemas can be used to check the shape of collections such as maps, vectors, and sets. For example, a schema that matches sets of numbers can be defined as: #{s/Num}.

Under the hood, schema extends the Schema protocol to sets, where the returned spec is a collection-spec:

(extend-protocol Schema
  clojure.lang.APersistentSet
  (spec [this]
    (macros/assert! (= (count this) 1) "Set schema must have exactly one element")
    (collection/collection-spec
     (spec/simple-precondition this set?) ;; precondition
     set ;; constructor
     [(collection/all-elements (first this))] ;; elements
     (fn [_ xs _] (set (keep utils/error-val xs)) ;; on-error
     )))
  (explain [this] (set [(explain (first this))])))

Note that the implementation of the set schema requires that all elements within the set match the specified schema. So for instance: (s/check #{s/Num} #{-1337 3.14 42}) would pass, but (s/check #{s/Num} #{"not a number" 3.14 42}) would not because "not a number" doesn't match s/Num.

Concretely, the CollectionSpec constructor takes a precondition, a constructor, a sequence of element specs, and an on-error function. The precondition does a superficial check on the input to see if the schema applies (in our example, the check verifies that the input passes the set? predicate). The constructor specifies how to construct the collection from its elements (in our example, the set function will construct a set when passed a collection of elements). The element specs is a sequence of maps that specify a schema for the element, a cardinality (exactly-one, at-most-one, or zero-or-more), and a parser that takes an item-fn and a partial datum. The the particular details of item-fn depend on the use-case; for the schema checking use-case, item-fn ends up recursively checking the value and appending the result to a list. No matter the implementation of item-fn, the parser will apply it to all matching items, and return the remaining items from the collection.

There are helper methods for constructing element specs. The set example above references the all-elements helper in spec/collection.cljx, which is a helper for constructing a spec that matches zero-or-more elements. There is also a helper called one-element, which is used to specify an optional or required element.

All of the core collection schemas (maps, vectors, sequences, sets, etc.) are already implemented, but it's now easier than ever to add your own if you need to.

And More!

The LeafSpec, VariantSpec, and CollectionSpec should cover nearly all use cases for defining new schema types. However, the system is defined in an extensible way, so if you're in the 0.001% who need to define an entirely new type of spec that does not cleanly fall into the leaf, variant, or collection cases, all you have to do is extend the CoreSpec protocol in spec/core.cljx. To do so, you just need to implement the subschemas function that returns all schemas contained within the schemas, and the checker function. After extending the protocol, your new type of schema will just work with all the schema infrastructure, like the checker.