Generative Testing

Posted on April 25, 2017

In this post, we’re going to walk through building a generator for valid GraphQL schemas specified in GraphQL Shema Language (sans Arguments). We will take a bottom-up approach starting with primitive generators, composing these into higher-level generators, and those into yet higher-level generators, and so on (à la Stratified Design1), building our way up to the final goal: a single simple generator that provides a unique valid GraphQL schema each time it is sampled. Along the way, we will pause to make sure that each primitive or composition of primitives produces correct values.

Download Full Source

First, let’s require test.check.

cljs.user=> (require '[clojure.test.check.generators :as gen :include-macros true])
true

Now we can begin defining our primitive generators. For the typenames for our object types let’s use the capital letters: A-Z. gen/elements takes a collection of values and produces a generator that randomly chooses an element from the collection:

cljs.user=> (def a-to-z-gen (gen/elements (map char (range 65 (+ 65 26)))))
#'cljs.user/a-to-z-gen

To convince ourselves that this is working, let’s try sampling a few items using the function gen/sample. gen/sample takes two arguments: a generator and a number of samples to generate. (There is also a single-arity version which generates 10 samples by default).

cljs.user=> (gen/sample a-to-z-gen 5)
("K" "M" "W" "G" "S")

Now that we can generate individual typenames for our objects, let’s build a generator that will produce a set of typenames given the first generator. Each of these sets will eventually specify the names of the types comprising an individual schema.

cljs.user=> (def entity-set-gen (gen/set a-to-z-gen {:min-elements 8 :max-elements 25}))
#'cljs.user/entity-set-gen

gen/set conveniently accepts min-elements and max-elements which let us put bounds on the sizes of the generated sets.

Sampling a few of these gives:

cljs.user=> (sample-one entity-set-gen)
#{"K" "Q" "L" "G" "J" "M" "S" "Y" "H" "C" "B" "V" "U" "A" "W"}
cljs.user=> (sample-one entity-set-gen)
#{"T" "Q" "J" "M" "S" "Y" "E" "F" "A" "I" "D"}
cljs.user=> (sample-one entity-set-gen)
#{"Q" "J" "S" "Y" "C" "B" "V" "U" "O" "N" "I"}

(sample-one is a helper simplifying the sampling of one element.)

cljs.user=> (defn sample-one [generator] (first (gen/sample generator 1)))
#'cljs.user/sample-one

With a generator for complete sets of typenames for our schemas in hand, let us proceed to composing it with a generator of unique pairs of relations between the types/entities. Two types will be considered related if at least one of the types has a field of the type of the other type in the relation (although the relation may be bidirectional/mutual and in the case where the relation is merely unidirectional, multiple fields may participate in the relation).2 We will consider these restrictions more closely as we proceed but, for now, it is enough to consider these two fairly obvious (basic) requirements: no relation may be repeated and relations are unordered; i.e., [A B] is the same as [B A].

First, let’s define relation-gen:

cljs.user=> (defn relation-gen [entity-set]
              (gen/list-distinct (gen/elements entity-set) {:num-elements 2}))
#'cljs.user/relation-gen

And sample it a few times:

cljs.user=> (sample-one (relation-gen #{"A" "B" "C" "D"}))
("C" "D")
cljs.user=> (sample-one (relation-gen #{"A" "B" "C" "D"}))
("C" "A")
cljs.user=> (sample-one (relation-gen #{"A" "B" "C" "D"}))
("A" "D")

And then combine it with entity-set-gen:

cljs.user=>
(def entities-and-relations-gen
  (gen/let [entity-set entity-set-gen
            relations (gen/list-distinct-by set (relation-gen entity-set)
                       {:min-elements (int (* (count entity-set) (/ 2 3)))
                        :max-elements (int (* (count entity-set) 3))})]
    {:entities entity-set :relations relations}))
#'cljs.user/entities-and-relations-gen

The above form utilizes gen/let which has equivalent semantics to built-in let except that generators are required on the right-hand side of the binding vector and the symbols on the left are bound to values generated by those generators. Also of note here is the use of list-distinct-by which is just like list-distinct except allowing a transform to be specified for use in the equality checks (which in our case is set).

Let’s take a sample from this latest generator:

cljs.user=> (pprint (sample-one entities-and-relations-gen))
{:entities #{"T" "L" "Y" "F" "P" "O" "N" "W" "D"},
 :relations
 (("W" "T")
  ("Y" "N")
  ("P" "O")
  ("L" "T") 
  ("F" "N") 
  ("T" "N") 
  ("O" "F") 
  ("Y" "L") 
  ("D" "T"))}

Each side of a relation may have a multiplicity of zero, one or many so let’s define a data structure representing these:

cljs.user=> (def multiplicities #{:zero :one :many})
#'cljs.user/multiplicities 

And a generator for them:

cljs.user=> (def multiplicity-gen (gen/elements multiplicities))
#'cljs.user/multiplicity-gen

Also we will require a non-zero multiplicity generator (i.e., either :one or :many):

cljs.user=> (require '[clojure.set])
nil
cljs.user=> (def non-zero-mult-gen (gen/elements (clojure.set/difference multiplicities #{:zero})))
#'cljs.user/non-zero-mult-gen

These work as expected:

cljs.user=> (assert (every? #(multiplicities %) (gen/sample multiplicity-gen 10000)))
nil
cljs.user=> (assert (every? #(#{:one :many} %) (gen/sample non-zero-mult-gen 10000)))
nil

Now we will define a Cardinality as two Multiplicities (one for each side of a relation):

cljs.user=> (defrecord Multiplicity [entity field multiplicity required?])
cljs.user/Multiplicity
cljs.user=> (defrecord Cardinality [left right])
cljs.user/Cardinality

Here is a wrapper around goog.string/format which we will use for string building:

cljs.user=> (require '[goog.string.format])
true
cljs.user=> 
(defn format
  "Formats a string using goog.string.format."
  [fmt & args]
  (apply goog.string/format fmt args))
#'cljs.user/format

And a function that produces generators of Cardinalities from relations:

cljs.user=> (require '[clojure.string :refer [lower-case]])
nil
cljs.user=> 
(defn relation->cardinalities-gen [[left right]]
  (gen/let [lmult multiplicity-gen
            repeats (gen/choose 1 5)]
    (let [create-card
          #(Cardinality.
            (Multiplicity. left (lower-case right)
                           lmult (sample-one gen/boolean))
            (Multiplicity. right %1 (sample-one non-zero-mult-gen)
                           (sample-one gen/boolean)))]
      (condp = lmult
        :zero (map
               create-card
               (map #(format "%s%s" (lower-case left) %1) (range repeats)))
        [(create-card (lower-case left))]))))
#'cljs.user/relation->cardinalities-gen

Let’s break that down a bit. First off, we generate a multiplicity for the left side of the relation and a number of repeats (which will only be used if the left side has a :zero participation). Then we define a simple helper function for constructing a cardinality: create-card. Notice that the field names are merely the destination type names, lower-cased, and optionally with a unique integer index postpended (for multiple-link relations only). And the right side’s multiplicity comes from the non-zero-mult-gen (which simplifies downstream logic by reducing the number of cases we need to consider). Then, in both branches of the condp (i.e., when left multiplicity is zero or non-zero), a sequence of Cardinality instances is generated.

Next up we have a fairly straightforward generator to combine our entities, relations & cardinalities into a single structure:

cljs.user=> 
(def entities-and-cardinalities-gen
  (gen/let [{:keys [entities relations]} entities-and-relations-gen
            cardinalities (apply gen/tuple (map relation->cardinalities-gen relations))]
    {:entities entities :relations relations :cardinalities (flatten cardinalities)}))
#'cljs.user/entities-and-cardinalities-gen 

And here’s an abbreviated sample of its output:

{:entities
 #{"T" "Q" "L" "G" "J" "M" "S" "Z" "H" "R" "C" "F" "B" "V" "O" "X" "N"
   "A" "I" "W" "D"},
 :relations
 (("J" "B") ("N" "I") ("Z" "G") ("H" "R") ("Z" "V") ("G" "V") ...)
 :cardinalities
  ({:left
   {:entity "J", :field "b", :multiplicity :zero, :required? true},
   :right
   {:entity "B", :field "j0", :multiplicity :many, :required? true}}
  {:left
   {:entity "N", :field "i", :multiplicity :zero, :required? false},
   :right
   {:entity "I", :field "n0", :multiplicity :one, :required? false}}
  {:left
   {:entity "Z", :field "g", :multiplicity :zero, :required? false},
   :right
   {:entity "G", :field "z0", :multiplicity :one, :required? true}}
   {:left
   {:entity "J", :field "l", :multiplicity :one, :required? true},
   :right
   {:entity "L", :field "j", :multiplicity :many, :required? true}}
  {:left
   {:entity "H", :field "a", :multiplicity :many, :required? true},
   :right
   {:entity "A", :field "h", :multiplicity :one, :required? false}}
   ...)}

Now let’s define a record to represent the description of a field in a GraphQL entity specification:

cljs.user=> (defrecord FieldDescriptor [field type multiplicity required?])
cljs.user/FieldDescriptor

And a generator for a plain (i.e., scalar or non-object/non-reference) field descriptor:

cljs.user=> (require '[camel-snake-kebab.core :refer [->camelCase ->PascalCase]])
true
cljs.user=> (def scalar-types #{"Boolean" "String" "Float" "Timestamp" "Int"})
#'cljs.user/scalar-types
cljs.user=> 
(defn field-gen [prefix]
  (let [nonempty-char-alpha-gen
        (gen/not-empty (gen/fmap clojure.string/join
                                 (gen/vector gen/char-alpha)))]
    (gen/fmap (partial apply ->FieldDescriptor)
              (gen/tuple (gen/fmap #(format "%s%s" prefix (->PascalCase %1))
                                  nonempty-char-alpha-gen)
                        (gen/elements scalar-types)
                        non-zero-mult-gen
                        gen/boolean))))
#'cljs.user/field-gen 

And take a few samples3:

cljs.user=> (sample-one (field-gen "test"))
#cljs.user.FieldDescriptor{:field "testQk", :type "Boolean", :multiplicity :many, :required? true}
cljs.user=> (sample-one (field-gen "test"))
#cljs.user.FieldDescriptor{:field "testH", :type "Timestamp", :multiplicity :one, :required? true}
cljs.user=> (sample-one (field-gen "test"))
#cljs.user.FieldDescriptor{:field "testE", :type "Int", :multiplicity :one, :required? false}

Next we will define a generator for object fields:

cljs.user=> 
(defn cardinality->obj-field-descriptor [entity cardinality]
  (let [grouped (group-by #(-> (%1 1) :entity) cardinality)
        this (first (vals (into {} (grouped entity))))
        other-entity (:entity ((((first (dissoc grouped entity)) 1) 0) 1))]
    (if (not= :zero (:multiplicity this))
      (FieldDescriptor. (:field this) other-entity (:multiplicity this) (:required? this)))))
#'cljs.user/cardinality->obj-field-descriptor

And then combine them together to describe complete entities:

cljs.user=> 
(defn entity->cardinalities [entity cardinalities]
  (filter #(or (= entity (-> %1 :left :entity)) (= entity (-> %1 :right :entity))) cardinalities))
#'cljs.user/entity->cardinalities
cljs.user=> 
(defn entity->descriptors-gen [entity cardinalities]
  (let [filtered (entity->cardinalities entity cardinalities) ; Note: O(n^2), can be optimized
        obj-fields (remove nil? (map (partial cardinality->obj-field-descriptor entity) filtered))
        default-field (FieldDescriptor. "id" "ID" :one true)]
    (gen/let [num-fields (gen/choose 2 8)
              plain-fields (apply gen/tuple (repeat num-fields (field-gen (->camelCase entity))))]
      {:entity entity :fields (concat [default-field] plain-fields obj-fields)})))
#'cljs.user/entity->descriptors-gen
cljs.user=> 
(def entity-descriptors-gen
  (gen/let [{:keys [entities relations cardinalities]} entities-and-cardinalities-gen
            entity-descriptors (apply gen/tuple (map #(entity->descriptors-gen %1 cardinalities) entities))]
    entity-descriptors))
#'cljs.user/entity-descriptors-gen

And sampling one complete entity gives:

cljs.user=> (pprint (-> (sample-one entity-descriptors-gen) first))
{:entity "K",
 :fields
 ({:field "id", :type "ID", :multiplicity :one, :required? true}
  {:field "kB", :type "Int", :multiplicity :many, :required? false}
  {:field "kO", :type "String", :multiplicity :one, :required? true}
  {:field "u", :type "U", :multiplicity :many, :required? true}
  {:field "r", :type "R", :multiplicity :one, :required? true})}

Now all that is left is to emit the GraphQL Schema Language given an entity description record:

cljs.user=> 
(defn emit-field [fd]
  (if (#{:one :many} (:multiplicity fd))
    (format "  %s: %s%s%s%s"
            (:field fd)
            (if (= (:multiplicity fd) :many) "[" "")
            (:type fd)
            (if (= (:multiplicity fd) :many) "]" "")
            (if (:required? fd) "!" "")) nil))
#'cljs.user/emit-field
cljs.user=> 
(defn emit-type [{:keys [entity fields]}]
  (format "type %s {\n%s\n}"
          entity
          (clojure.string/join "\n" (remove nil? (map emit-field fields)))))
#'cljs.user/emit-type

And join a sequence of them together into the full schema:

cljs.user=> 
(def schema-str-gen
  (gen/let [entity-descriptors entity-descriptors-gen]
    (clojure.string/join "\n\n" (map emit-type entity-descriptors))))
#'cljs.user/schema-str-gen

And the final product:

cljs.user=> (print (sample-one schema-str-gen))
type H {
  id: ID!
  hX: [Boolean]!
  hY: Int!
  hL: Boolean
  hO: [String]!
  x: X
  r: R
  m: [M]
  y: [Y]!
}

type Y {
  id: ID!
  yUk: [Boolean]!
  yI: [Timestamp]
  yA: Int!
  q: Q
  h: H
}
...

  1. Lisp: A Language for Stratified Design

  2. These restrictions are placed on us by the library this was originally designed to test, speako, which does not support non-standard extensions to the, as of yet informally specified, GraphQL Schema Language. Yet even with these restrictions speako is quite usuable/flexible in practice and there are straightforward workarounds for obtaining multiple bidirectional relations between two entities if one absolutely requires them. Arguably the benefit of keeping the GQL Schema Language syntax simpler (and more portable to other GQL backends) is well worth the tradeoff of slightly more restrictions on the nature of relations.

  3. field-gen takes a prefix so that we can ensure that each set of fields generated for our entities is unique (which is a requirement of speako).