Runtime Data Validation from TypeScript Interfaces

For the last year or so, I’ve been (slowly) building a TypeScript-based Node.js framework called Extollo. One of the design goals with Extollo is to only expose the user (i.e. the developer) to ES/TypeScript native concepts, in order to reduce the amount of special knowledge required to get up and running with the framework.

Runtime schemata: a plague of DSLs

One of my biggest pet-peeves with the current Node.js framework scene is that nearly every ecosystem has to re-invent the wheel when it comes to schema definitions. Because JavaScript doesn’t have a native runtime type-specification system (at least, not a good one), if you want to encode details about how a data structure should look at runtime, you need to design a system for passing that information along at runtime.

For example, a prolific MongoDB ODM for Node.js, Mongoose, gives users the ability to specify the schema of the records in the collection when the user defines a model. Here’s a sample schema definition from the Mongoose docs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import mongoose from 'mongoose';
const { Schema } = mongoose;

const blogSchema = new Schema({
title: String, // String is shorthand for {type: String}
author: String,
body: String,
comments: [{ body: String, date: Date }],
date: { type: Date, default: Date.now },
hidden: Boolean,
meta: {
votes: Number,
favs: Number
}
});

I’m currently building the request validation system for Extollo. Because it has to process web requests with dynamic input, the validator interfaces need to be specified at runtime so the data can be checked against the schema. To do this, I’m using the fantastic Zod schema validator library written by Colin McDonnell.

However, Zod falls victim to the same fundamental problem with runtime schemata in JavaScript as Mongoose. Because its schemata need to be available at runtime, you have to use Zod’s custom schema builder to define your interfaces. Here’s an example of a schema for some data that might come from a login page:

1
2
3
4
5
6
7
import { z } from 'zod'

export const LoginAttemptSchema = z.object({
username: z.string().nonempty(),
password: z.string().nonempty(),
rememberMe: z.boolean().optional(),
})

That’s not too bad, but it does require the developer to learn Zod’s specific schema definition language. I find this especially annoying since TypeScript already has an interface definition language! This is a situation where I’d like to avoid making the developer learn an equivalent system if they already know the one built into the language.

Let’s rewrite this schema in TypeScript for a start:

1
2
3
4
5
6
7
8
9
export interface LoginAttempt {
/** @minLength 1 */
username: string

/** @minLength 1 */
password: string

rememberMe?: boolean
}

Okay, that’s an improvement! We can use TypeScript’s native type syntax to define the interface, and augment it with JSDoc comments for any properties that can’t be natively expressed. So, to use this with Zod, we need to convert it from the TypeScript syntax to the Zod syntax. Luckily, Fabien Bernard has spearheaded the excellent ts-to-zod project, which looks through interfaces defined in a file and outputs the equivalent Zod schemata for them.

Hmm.. so now the user can write their schema definitions in (mostly) native TypeScript syntax, and, with a bit of helper tooling, we can convert them to the Zod format so we can use them at runtime. Perfect! Well, almost…

We have a subtle problem that arises when we want to actually use a schema at runtime. Let’s look at an example:

1
2
3
4
5
6
7
8
import { Validator } from '@extollo/lib'
import { LoginAttempt } from '../types/LoginAttempt.ts'

class LoginController {
public function getValidator() {
return new Validator<LoginAttempt>()
}
}

This class has a method which returns a new Validator instance with the LoginAttempt schema as its type-parameter. Intuitively, this should produce a validator which, at runtime, validates data against the LoginAttempt schema. Let’s look at the compiled JavaScript:

1
2
3
4
5
6
7
8
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const Validator_1 = require("@extollo/lib").Validator;
class LoginController {
getValidator() {
return new Validator_1.Validator();
}
}

Uh, oh. Ignoring the boilerplate noise, we see that our nice, type-parameterized Validator instance has been stripped of its type information. Why? TypeScript is a transpiler. So, it takes TypeScript code and outputs the equivalent JavaScript code. Because JavaScript has no concept of types at runtime, the transpiler (in this case, tsc) strips them out.

So now we have a problem. We’ve improved our user-interface by only requiring the developer to specify the TypeScript types, but now we can’t use them at runtime, because the TypeScript types get stripped away. ‘What about the Zod schema we just generated?’ you ask, wisely. Well, unfortunately, there’s no mapping between the interface and the Zod schema it induced, and there’s no easy way to create such a mapping, because it has to be done at compile-time.

A very deep rabbit-hole

Ordinarily, this is where the story ends. You need some kind of mapping between the interface and the Zod schema (which, remember, the developer has no idea exists thanks to our ts-to-zod magic) to make the Validator work. In a generic TypeScript project, you’d have to have some kind of naming convention, or expose the schema to the user somehow to create the mapping.

However, Extollo has a unique advantage that I suspected could be used to solve this problem transparently: excc. Extollo projects are primarily TypeScript projects, but they also contain other files like views, assets, &c. that need to be included in the built-out bundle. To standardize all of this, Extollo uses its own project-compiler called excc for builds. excc is primarily a wrapper around tsc that does some additional pre- and post-processing to handle the cases above.

Because Extollo projects are all using excc, this means that we can do arbitrary processing at compile time. I suspected that there would be a way to create a mapping between the interfaces and the schemata we generate for runtime.

Zod-ifying the Interfaces

The first step was converting the TypeScript interfaces to Zod schemata using ts-to-zod. In excc, this is implemented as a pre-processing step that appends the Zod schema to the .ts file that contains the interface. So, the processed LoginAttempt.ts might look something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import { z } from "zod";

export interface LoginAttempt {
/** @minLength 1 */
username: string

/** @minLength 1 */
password: string

rememberMe?: boolean
}

export const exZodifiedSchema = z.object({
username: z.string().nonempty(),
password: z.string().nonempty(),
rememberMe: z.boolean().optional(),
});

This has some drawbacks. Namely, it assumes that only one interface is defined per-file. However, Extollo enforces this convention for other concepts like models, middleware, controllers, and config files, so it’s fine to make that assumption here.

This gets us closer, but it still doesn’t do the mapping for the runtime schema. The first step to this is going to be devising some way of referencing a schema so that we can easily modify the TypeScript code that uses its related interface.

I don’t love the initial system I have for this, but what excc does now is generate a unique ID number for each interface it Zod-ifies. Then, when it is writing the Zod schema into the interface’s file, it adds code to register it with a global service that maps the ID number to the Zod schema at runtime. So, the above file would actually look something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import { z } from "zod";
import { registerZodifiedSchema } from "@extollo/lib";

export interface LoginAttempt {
/** @minLength 1 */
username: string

/** @minLength 1 */
password: string

rememberMe?: boolean
}

/** @ex-zod-id 11@ */
export const exZodifiedSchema = z.object({
username: z.string().nonempty(),
password: z.string().nonempty(),
rememberMe: z.boolean().optional(),
});
registerZodifiedSchema(11, exZodifiedSchema);

This may not seem like much, but this is a huge step toward our goal. We now have, at compile time, a mapping of interfaces to IDs and, at runtime, a mapping of IDs to schemata. So, we can use the compile-time map to modify all the places that reference the interface to set a runtime parameter with the ID of the schema for that interface. Then, at runtime, we can look up the schema using the ID. Bingo! No, how do we actually do that…

Wrangling the AST

Now that we have our mapping, we need to make sure that a look-up is done whenever the type is referenced in code. That is, anywhere where we create a Validator<LoginAttempt>, we should set the ID of the Zod schema for LoginAttempt on that Validator instance.

To accomplish this, I wrote a couple of transformer plugins for TypeScript. Now, tsc doesn’t support plugins by default. (You may have seen plugins in the tsconfig.json for a project, but they are plugins for the editor’s language server, not the compiler.) Luckily for us, again, there exists a fantastic open-source package to solve this problem. Ron S. maintains a package called ts-patch which, aptly, patches the tsc installation for a project to allow the project to specify compiler-plugins.

These plugins operate on the abstract syntax-tree of the TypeScript program. If you’re not familiar with ASTs, they’re basically the compiler’s internal representation of the program you’re compiling. They are data structures which can be manipulated and optimized. When you install a plugin, it is called repeatedly with the AST for each source file in the TypeScript project you’re compiling. Importantly, the plugin can replace any of the nodes in the AST, or return a completely different one, in the file, and tsc will output the modified version instead of the original.

First, Identify

The first plugin operates on the entire AST for each file in the project. Its job is to walk through each file’s AST and look for interface declarations that we generated Zod schema for. When it finds one, it parses out the ID number we wrote into the file earlier and stores a mapping between that ID number and the symbol TypeScript uses to identify the interface internally.

Because we were the ones that wrote the Zod schema into the file, we can know that it – and the call to register it with the global service – are the last statements in the file. So, we can quickly look them up and parse out the ID from the registration call.

The TypeScript AST for the augmented file, at this point, looks something like this:

(As an aside, I used the ts-ast-viewer web app to generate this hierarchy. ts-ast-viewer is a project started by David Sherret that allows you to visualize and explore the AST for any TypeScript program. It was invaluable in helping me figure out the structures for this project.)

By recursively walking the AST, we can look for the InterfaceDeclaration nodes. If we find one in a file, we can check the root of the file to see if an Identifier called exZodifiedSchema is defined. If so, we grab the last statement in the file (an ExpressionStatement containing the call to registerZodifiedSchema) and pull out its first argument, the ID number.

Once this transformer finishes, we’ve identified all of the interfaces for which we generated Zod schemata and created a mapping from the interface to the ID number we need at runtime.

Then, Modify

The second plugin runs after the first has finished going through all the files in the project. This plugin’s job is to replace any NewExpression nodes where the type parameters contain Zod-ified interfaces with an IIFE that sets the __exZodifiedSchemata property to an array of the ID numbers used to look up the schemata for those interfaces.

That is, the plugin transforms this:

1
new Validator<LoginAttempt>()

into this:

1
2
3
4
5
(() => {
const vI = new Validator<LoginAttempt>();
vI.__exZodifiedSchemata = [11];
return vI;
})()

And because the NewExpression is an expression just like the CallExpression is, anywhere where we have a NewExpression can instead have this CallExpression that wraps it with additional logic. The transformer is able to look up the ID numbers associated with the interfaces because the Identifier that references the interface in new Validator<LoginAttempt>() has the same symbol set on it as the InterfaceDeclaration we identified with the first plugin.

These symbols are created by something in the TypeScript compiler called the linker, which resolves all the identifier declarations and matches them up with the uses of those identifiers, even if the actual variables have been renamed along the way. So, we can use these symbols to match up uses of the interface with the declarations of the interfaces we care about.

This is the magic sauce that finally makes it all work. After this plugin runs, the program TypeScript finishes compiling has all of the runtime type mappings linked up to the Validator instances based on which interface was specified when the Validator was instantiated.

Conclusion

This was a long, and kind of hacky journey, but the end result is excellent. From the developer’s perspective, they can type this:

1
const validator = new Validator<LoginAttempt>();

and, at runtime, the validator instance will have a Zod schema and will be able to parse data against the schema. No custom schema definition languages, no validator/schema mapping, nothing. To the developer, it’s all just pure TypeScript, which was the goal all along.

The code for this feature is still very much work-in-progress, and I have to remove a lot of unused code and clean up what I keep, and probably rewrite part of it to be a less… jank. But, at the end of the day, I’m really happy with this “magic” feature that will help keep Extollo projects TypeScript-native, and easy to maintain.

You can find a minimal working example matching this post here.