Stash

Stash is a graph-based cache for Node.js powered by Redis.

Warning! Stash is just a mental exercise at this point. Feedback is very much appreciated, but using it in production may cause you to contract ebola or result in global thermonuclear war.

Overview

“There are only two hard things in computer science: cache invalidation and naming things.”

One of the most difficult parts about caching is managing dependencies between cache entries. In order to reap the benefits of caching, you typically have to denormalize the data that's stored in the cache. Since data from child items is then stored within parent items, it can be challenging to figure out what entries to invalidate in the cache in response to changes in data.

Here's a concrete example. Consider the following data model, powering a blog:

A Page can have zero or more Posts, each of which can have zero or more Comments. In order to quickly retrieve the contents of a Post, you'll want to denormalize its data and cache it. This means that when you store a Post, it will contain all of its Comments — and when you store a Page, it will contain all of its Posts. Not just the IDs, but the actual data — otherwise you'd have to do multiple cache lookups to reconstitute the parent objects, and the point of the cache is to speed up the loading process.

Here's some sample data representing a couple of Posts, each with Comments, rolled up into a Page.

comment1 = { id: "comment1", author: "John Doe", text: "What an interesting post!" };
comment2 = { id: "comment2", author: "Jane Doe", text: "I know, right? I should become a writer." };
post1 = {
  id: "post1",
  author: "Jane Doe",
  text: "Here's a post. It's fascinating."
  comments: [comment1, comment2]
};
comment3 = { id: "comment3", author: "John Doe", text: "Yeah, this post is boring. Don't quit your day job." };
post2 = {
  id: "post2",
  author: "Jane Doe",
  text: "Here's another post, which is not nearly as fascinating.",
  comments: [comment3]
};
page1 = {
  id: "page1",
  posts: [post1, post2]
};

To cache this information, we can serialize it to a format like JSON. This is what page1 would look like after being serialized:

{
  "id": "page1",
  "posts": [
    "id": "post1",
    "author": "Jane Doe",
    "text": "Here's a post. It's fascinating.",
    "comments": [{
      "id": "comment1",
      "author": "John Doe",
      "text": "What an interesting post!"
    },{
      "id": "comment2",
      "author": "Jane Doe",
      "text": "I know, right? I should become a writer."
    }]
  },{
    "id": "post2",
    "author": "Jane Doe",
    "text": "Here's another post, which is not nearly as fascinating.",
    "comments": [{
      "id": "comment3",
      "author": "John Doe",
      "text": "Yeah, this post is boring. Don't quit your day job."
    }]
  }]
}

This denormalization means that you'll be able to retrieve all of the data necessary to display the Post with a single lookup.

Unfortunately, it also means that if someone edits one of the Comments, the data stored for the Post becomes stale. Likewise, when someone edits one of the Posts, the data stored for the Page becomes stale. In order to ensure that you are serving correct data, your application must remove the stale data from the cache.

Stash helps to solve this problem by representing your cache entries as a directed graph of dependencies. If you consider the relationship of dependencies in our example, it looks like this:

Let's say John comes along and edits his first comment. This means that the data stored at this position is no longer valid, and should be removed from the cache:

That's not too hard, but since we decided to denormalize the data for comment1 into post1 and page1, we have to remove those items from the cache as well. When you invalidate an entry, Stash will walk the dependency graph and remove any dependent entries from the cache as well. In this case, Stash would remove the following subtree from the graph, leaving the rest alone:

Even when an item is invalidated, Stash remembers the dependencies between them, so you only need to alter the dependency graph when the relationships between your items change, not when the data changes.

API

The Stash API is designed to feel similar to Redis commands. Many functions support operations on multiple keys at once; if you use them in this way, Stash does its best to parallelize the calls to Redis for efficiency.

All callbacks use the standard Node.js convention of (err, data).

new Stash( config )

Creates a new instance of Stash and connects to Redis. Configuration options may include:

get( key, callback )

Gets the item stored at the specified key.

set( key, value, [callback] )

Sets the value stored at the specified key. If a callback is specified, it will be called after the item is set.

dadd( key, dependencies..., [callback] )

Declares that the value stored at key depends on values stored at the keys specified in dependencies. The dependencies may be either an array, varargs, or a combination of both. If a callback is specified, it will be called after the dependencies are created.

drem( key, [dependencies...], [callback] )

Removes dependency links between the value stored at key and the values stored at the keys specified in dependencies. The dependencies may be either an array, varargs, or a combination of both. You may also omit dependencies in which case all dependencies will be removed. If a callback is specified, it will be called with an array of keys that were previously dependencies.

dget( keys..., callback )

Gets the dependency links for the values stored at keys. If one key is specified, the callback will be called with a hash, with these properties:

If you call dget with multiple keys, the callback will be called with a structure like this:
{
  key1: { in: [...], out: [...] },
  key2: { in: [...], out: [...] },
  ...
}

dset( key, dependencies..., [callback] )

Sets the dependency links for the value stored at key to the keys specified in dependencies, replacing any links that previously existed. The dependencies may be either an array, varargs, or a combination of both. The callback will be called with a hash, with these properties:

inv( keys..., [callback] )

Declares that the value stored at each key in keys is no longer valid and should be removed from the cache. The keys may be either an array, varargs, or a combination of both. If a callback is specified, it will be called with an array of keys that were removed from the cache.

rem( keys..., [callback] )

Removes the entries stored at each key in keys from the cache. The keys may be either an array, varargs, or a combination of both. This differs from inv() in that it does not remove dependent items, and destroys the edges of the graph related to the items. Use this when an item no longer exists, and won't be added to the cache again. If a callback is specified, it will be called after the items are removed.

quit( [force] )

Disconnects from Redis. If you want to disconnect immediately and terminate in-flight requests, pass true for force.

Fork me on GitHub