Storex — A modular and portable database abstraction ecosystem for JavaScript

Wednesday, January 23, 2019

Databases are not new. In fact, they can be pretty boring. Yet, we keep having to repeat implementing the same stuff over and over again because the context in which we build things keeps changing. But should it really matter that much whether we’re running client- or server-side, on PostgreSQL, MongoDB or Firebase, using microservices, monoliths or Lambda functions? In the end, we still have to solve the same problems, albeit in different configurations. I don’t know about you, but I’m getting pretty tired having to keep thinking about schema migrations, REST APIs, live migrations of data, conflict resolution for offline-first applications synced between multiple devices, access control and all of these things we keep solving over and over throughout the years.

That’s how Storex was born. Memex, a client-side full-text search engine for everything you’ve seen online, was growing into a multi-purpose knowledge management tool. We aim to build it as decentralised as possible but needed much more time to research the available technology to come up with a good architecture that meets every expectation of a modern application. So we needed freedom to move data around without locking ourselves into a specific technology, or having to rewrite our entire stack once the choices around technology become clearer.

What Storex is, and is not

Storex is a collection of packages implementing functionality around everything you need to build a full application that handles data, which are modular enough to be re-combined in any way necessary. The core package only provides a way to describe your data model (a user has email addresses, which can have verification codes, etc.) and a way for different backends to execute operations of your data. How backends then implement these operations is up to them. Then there are different packages that build on top of this, like the schema migration package, or the future access control package. The core of Storex and the packages around it are based on one philosophy: provide unified ways to talk about data and the interaction with it, then allow different back-ends and packages to implement these things in different context.

From this flows that Storex is not a framework, but a library: re-usable, very loosely coupled packages that can be recombined. It’s also not an ORM, even though one could be built on top of it (which I highly advise against, since ORMs encourage mixing of business logic with storage logic.)

What does this look like in practice? Let’s start from an example:

import StorageManager from 'storex'
import { DexieStorageBackend } from 'storex-backend-dexie'

const storageBackend = new DexieStorageBackend({dbName: 'my-awesome-product'})
const storageManager = new StorageManager({ backend: storageBackend })
storageManager.registry.registerCollections({
  user: {
    version: new Date(2018, 11, 11),
    fields: {
      identifier: { type: 'string' },
      isActive: { type: 'boolean' },
    },
    indices: [
      { field: 'identifier' },
    ]
  },
  todoList: {
    version: new Date(2018, 7, 11),
    fields: {
      title: { type: 'string' },
    },
    relationships: [
      {childOf: 'user'} # creates one-to-many relationship
    ],
    indices: []
  },
  todoListEntry: {
    version: new Date(2018, 7, 11),
    fields: {
      content: {type: 'text'},
      done: {type: 'boolean'}
    },
    relationships: [
      {childOf: 'todoList', reverseAlias: 'entries'}
    ]
  }
})
await storageManager.finishInitialization()

const user = await storageManager.collection('user').createObject({
  identifier: 'email:boo@example.com',
  isActive: true,
  todoLists: [{
    title: 'Procrastinate this as much as possible',
    entries: [
      {content: 'Write intro article', done: true},
      {content: 'Write docs', done: false},
      {content: 'Publish article', done: false},
    ]
  }]
})
# user now contains things generated by underlying backend, like ids and random keys if you have such fields
console.log(user.id)

await storageManager.collection('todoList').findObjects({user: user.id}) # You can also use MongoDB-like queries

What we do is to define the schema of your data and the relationships between them in a graph-like way (using childOf, singleChildOf and connects, see more in the docs.) Then we can do different read/write operations on the database which gets delegated to the backend who can translate it into lower-level idiomatic ways to deal with the underlying database. (In this case the backend is IndexedDB through Dexie client-side, but a Sequelize backend for talking to SQL database server-side is also available.) The idea is that each back-end (and plugins) can implement their own specific namespaced operations, while the ones that can be shared among a wide variety of different databases (createObject, transaction, etc.) are carefully designed and standardized. There’s a standard feature detection mechanism so your code can detect automatically whether your code can run on top of a new back-end, or adapt to different feature sets (direct full-text search, or needing an external full-text search database for example.) And if all else fails, each database allows you to access the lower-level connection directly.

A side effect of this design is that your application can both run fully client-side and server-side in highly scalable settings without changing your business logic. Just change the above initialization of the StorageManager to:

import { SequelizeStorageBackend } from 'storex-backend-sequelize'

const storageBackend = new SequelizeStorageBackend({host, username, password, database: 'my-awesome-product'})
const storageManager = new StorageManager({ backend: storageBackend })

Also, the Dexie back-end supports ultra-fast full-text client-side search using a custom stemmer, which is currently used in the Memex browser extension to remember everything you see and your notes about it, letting you search all of your knowledge without sacrificing your privacy! My favorite of this portability between client- and server-side is improving my developer workflow by running my entire application in-memory client-side, so I don’t have to run a back-end while developing and can initialize the database using fixtures to easily develop certain features with a certain data-set. Or of course, test driven development for your client-side application directly from the command-line using an in-memory database, which will then also work client-side using Dexie.

Solving common problems surrounding your data

But, deploying real-world applications involves more than just querying and manipulating your data. That’s why there will be different packages building on Storex addressing these problems in a way that allows you to adapt to changing environments as your application grows.

One of these problems is schema migrations. Apart from SQL abstraction layers, every technology out there seems to have its own solution. Wouldn’t it be great if we could just describe how a migration should done and have things be modular enough so we can execute it with one line of code, generate an SQL script, generate a Firebase function, or maybe even use this to transform data imported from a file or received over the network? This is how storex-schema-migrations works. It’s divided into a few sub-packages that calculate the difference between two schema’s, generate a migration, and can execute it. However, you can take this generated migration yourself, so you can translate it to an SQL script to hand to a DBA, or do other things with it. Or use the schema diff yourself to visualize your schema changes. Or maybe you want to apply them to your unit test fixtures so you don’t have to update them by hand all the time.

There’s lots of other features on the roadmap, which we’re implementing as we need them. Some of these are:

Synchronization of data sources: when working in a peer-to-peer setting, or creating offline-first applications where multiple devices need to sync their offline changes to the cloud, you need to record changes, replay them, and resolve conflicts that arise. This can be quite a difficult problem, but is a common one, for which nothing database-agnostic exists which means that everybody has to keep solving this, leading to lots of shitty offline-first out there. (This is a really high priority for Memex, where people want to search their history across multiple devices.) You can contribute your thoughts, use cases and requirements here.
Access control: when working with a data source shared among different users, you need to decide who gets access to what. Aside from designing a permission system for every new application, the way this is done changes drastically when moving from a Backend as a Service or any other system that wants to manage your permissions, to when you implement your own REST / GraphQL API. Ideally, you should be able to declare your permissions and enforce them in flexible ways. This idea is explained further here.
Automatic backend APIs: should it really matter that much where the code lives that accesses your database? Ideally you should be able to move that code around freely between front-end and back-end, while not caring about what actually moves your data around. The idea here is to again be able to write your storage logic with some annotations, and just generate your API server and consumer automatically. This functionality should be designed in such a way that you can customize your API layout if needed, but have sane defaults, while being modular enough to port between different HTTP frameworks (Express, Koa, etc.) or splitting your API into different micro-services. (Get involved here, although some new insights need to be documented there.) Having this in place would allow you to just develop your application entirely in your browser for quick iteration, while having which backend and how your microservices are split up just being a configuration option.
Registering your operations: Kind of like prepared statements, this would allow you to register what kind of operations you’re going to do in one place. This’d allow backends to optimize translation into low-level queries, while having one place where you have an overview of how your application interacts with your data. This allows you for example to get a better idea where to place your indices, how to shard your data, what data higher-level operations are reading and manipulating (this ties in with access control) and generate very cool analytics about database performance.
Live migrations between different databases: Your stack will keep evolving as your application grows and matures. This means that at some point, your going to move your data to a different database. A popular thing to do is also to start developing your application using a BaaS and create your own back-end when you start to scale to save costs. This involves writing to two databases at once and other tricks. There should be code for these tricks to make the process as easy and reliable as possible.

Next steps

I hope you’re excited as I am about being able to create awesome applications that you can run wherever and however you want, while really being able to concentrate on creating instead of setting up infrastructure that you have already implemented dozens of times. So try it out, and please let us know what you think! Feel free to create and contribute to issues, or reach out to me directly!