This is a guest post by Karol Broda. I like building tools that solve problems I keep running into, and doba came out of one of those.
Most apps have the same data in multiple shapes. A database row has a password hash and internal metadata. The frontend gets a sanitized version without any of that. The AI endpoint needs a flat struct with just the fields the model cares about. A legacy API from two years ago returns something different entirely.
The typical solution is a handful of functions. toFrontendUser(), toAIUser(), fromLegacyV1(). Each one is fine on its own. The problem is that they don't know about each other.
Someone needs legacy-to-AI, there's no function for that, so they chain two together and hope the intermediate shape doesn't change. Nobody writes tests for these because they're "just mapping."
Then a schema change ships and half your transforms silently produce wrong data.
I wrote doba to deal with this (source). It's a schema registry that works with any Standard Schema compatible library, but I use it with Valibot, and the two pair well for a few reasons.
Valibot's modular architecture means your bundle only includes what you actually use. A registry with ten schema variants doesn't pull in validators that only two of them use. With most other schema libraries, you'd import the whole thing regardless.
Valibot's type inference is also what makes doba's typed migrations work. When you define a schema with v.object(), the inferred type flows straight into the migration function signature. Rename a field in the Valibot schema and the migration won't compile until you fix it. That feedback loop is the core of what makes this useful.
Schemas and migrations
You register your Valibot schemas and define migrations between them. Each migration function is fully typed against the source and target schemas.
import { createRegistry } from 'dobajs';
import * as v from 'valibot';
const databaseUser = v.object({
id: v.string(),
email: v.pipe(v.string(), v.email()),
passwordHash: v.string(),
createdAt: v.pipe(v.string(), v.isoTimestamp()),
settings: v.object({
theme: v.picklist(['light', 'dark']),
notifications: v.object({
email: v.boolean(),
push: v.boolean(),
}),
}),
});
const frontendUser = v.object({
id: v.string(),
email: v.pipe(v.string(), v.email()),
createdAt: v.pipe(v.string(), v.isoTimestamp()),
settings: v.object({
theme: v.picklist(['light', 'dark']),
notifications: v.object({
email: v.boolean(),
push: v.boolean(),
}),
}),
});
const aiUser = v.object({
id: v.string(),
email: v.string(),
theme: v.string(),
hasNotifications: v.boolean(),
});
const registry = createRegistry({
schemas: { database: databaseUser, frontend: frontendUser, ai: aiUser },
migrations: {
'database->frontend': (user) => ({
id: user.id,
email: user.email,
createdAt: user.createdAt,
settings: user.settings,
}),
'frontend->ai': (user) => ({
id: user.id,
email: user.email,
theme: user.settings.theme,
hasNotifications:
user.settings.notifications.email || user.settings.notifications.push,
}),
},
});Add a new required field to a target schema and every migration pointing at it lights up red until you handle it. That's Valibot's type inference doing the work.
You don't need to write a migration for every possible pair of schemas either. If there's no direct path, doba walks the graph and chains through intermediate ones automatically:
// We only defined database->frontend and frontend->ai,
// but this still works. doba routes through frontend.
const result = await registry.transform(databaseData, 'database', 'ai');Migration context
Legacy migrations are full of quiet decisions. Defaulting an ID because the old format didn't have one. Guessing an email from a name field. Mapping a boolean called darkMode to a theme enum.
In a regular transform function, all of that disappears into the function body. You migrate 10k legacy users, three months later someone asks why half of them have unknown@example.com as their email,
and nobody remembers what the migration assumed.
doba passes a context object to every migration so you can record what you defaulted and why.
const legacyUser = v.object({
name: v.optional(v.string()),
darkMode: v.optional(v.boolean()),
});
// A separate registry that also includes the legacy schema
const extendedRegistry = createRegistry({
schemas: {
database: databaseUser,
frontend: frontendUser,
ai: aiUser,
legacy: legacyUser,
},
migrations: {
// ...previous migrations
'legacy->frontend': (user, ctx) => {
ctx.defaulted(['id'], 'generated new id');
ctx.defaulted(['createdAt'], 'set to current timestamp');
let email = 'unknown@example.com';
if (user.name && user.name.length > 0) {
email = `${user.name.toLowerCase().replace(/\s+/g, '.')}@legacy.example.com`;
ctx.warn(`converted name "${user.name}" to email`);
}
return {
id: `legacy-${Date.now()}`,
email,
createdAt: new Date().toISOString(),
settings: {
theme: user.darkMode === true ? 'dark' : 'light',
notifications: { email: false, push: false },
},
};
},
},
});
const result = await extendedRegistry.transform(
{ name: 'Alice Johnson', darkMode: true },
'legacy',
'frontend'
);
if (result.ok) {
result.meta.defaults; // [{ path: ['id'], message: 'generated new id', ... }]
result.meta.warnings; // [{ message: 'converted name "Alice Johnson" to email', ... }]
}result is a discriminated union. ok: true with value and metadata, ok: false with typed validation errors. No try/catch.
Most migrations are mechanical. Renaming a field, dropping another, adding a default. doba has a pipe builder for that so you don't have to write the boilerplate by hand:
'database->frontend': {
pipe: (p) => p.drop('passwordHash'),
},The builder tracks the shape as you chain. A .rename('foo', 'bar') followed by .drop('foo') is a type error.
Identifying unknown data
Sometimes you get data and don't know which schema it came from. doba can figure that out and transform it:
import { createRegistry, match } from 'dobajs';
const registry = createRegistry({
schemas: { database: databaseUser, frontend: frontendUser, ai: aiUser },
migrations: {
// ...same as before
},
identify: {
database: match.field('passwordHash'),
frontend: match.fields('createdAt', 'settings'),
ai: match.field('hasNotifications'),
},
});
const result = await registry.identifyAndTransform(unknownData, 'ai');
if (result.ok) {
result.value; // transformed data
result.meta.from; // which schema it detected
result.meta.path; // the route it took, e.g. ['database', 'ai']
}Guards run in definition order. If none match, you get a typed error, not a runtime crash.
Where this helps
The examples above are simplified, but the pattern shows up in a lot of places.
API versioning is the obvious one. You ship v1, then v2 changes the shape, then v3 splits a field into two. Clients are still sending all three versions. Instead of writing v1-to-v3 and v2-to-v3 converters by hand, you define v1-to-v2 and v2-to-v3 and the registry chains them. When v4 ships, you add one migration and everything upstream still works.
LLM pipelines have a similar problem. Your database has a rich, nested user object, but the model prompt needs a flat struct with five fields. That transform is easy to write once. It's less easy to keep correct when the database schema evolves or when you need three different prompt formats for different models.
Legacy imports are where the migration context really pays off. If you're pulling records from an old system and half the fields are missing or renamed, every decision you make during that conversion ("defaulted email because the source didn't have one") is recorded. When someone asks about it six months later, the metadata is right there on the result.
Even something like a webhook handler fits. You receive payloads from a third party that's changed their format twice. You don't control when they migrate. identifyAndTransform figures out which version came in and normalizes it.
If any of this sounds like a problem you have, take a look. Feedback and issues welcome.
Thanks to Fabian Hiller and the Valibot team for having me on the blog.
