This article was first posted on Medium. Please leave me your comments there, thanks!
We all know the importance of caching to improve the performance of our application. There are multiple places where we can add a cache layer and today we are going to see how we can add a cache layer at the application level. This cache will prevent us from hitting the Datastore again and again, asking for the same unmodified data.
But how do we easily cache Datastore entities Keys or Datastore Queries? How do we know when we need to invalidate the cache so we are sure to always fetch the latest data from the Datastore?
I am going to document here the process I’ve been going through to give a solution to those questions. This process led me to first release gstore-cache. But then realized that the cache mechanism could be used for other NoSQL databases and I separated the cache logic (nsql-cache) from the database implementation, and made it vendor agnostic.
I have just released the first database adapter for the Google Datastore: nsql-cache-datastore. This cache layer sits right in front of the @google-cloud/datastore client and automatically manage the cache for you.
The default, “magic” way
I am going to show you straight away how easy it is to add a cache layer to your existing application with nsql-cache-datastore. Hopefully this way I will be able to keep your attention throughout all the post… :)
Install the dependencies
Instantiate the cache
That’s it. With this 3 lines of code, you’ve added an LRU memory cache for your entities fetching that will give a performance boost to your app right away. It has the following default configuration:
- Maximum number of objects in cache: 100
- TTL (time to live) for entities (fetch by Key): 10 minutes
- TTL for Queries: 5 second
The rest of your application code does not change. Import the @google-cloud/datastore instance from the file above and use its API.
datastore.createQuery() and all the necessary methods from the @google-cloud/datastore have been wrapped by nsql-cache and you don’t have to worry about the cache.
If you don’t like so much magic, I will show you below how you can deactivate the wrapping of the client and manually manage the cache.
A nice feature to highlight is when you do batch operations (muliple keys) with the
datastore.get() method. nsql-cache will only fetch the keys that it does not find in the cache (in multi-stores — that we will see below — this means that it will go through each cache sequentially, looking for the keys not found in the previous cache). In the previous example, if key1 is in the cache but not key2, nsql-cache will only fetch the key2 from the Datastore.
A memory cache on the server is great to add a quick performance boost, but it has, of course, its limitations (e.g. in Serverless there is no such thing as shared memory between requests).
Let’s see how we can connect nsql-cache to a global Redis database.
Connect to Redis
You now have a Redis cache with the following default configuration:
- TTL for entities (keys): 1 day
- TTL queries: 0 → infinite
Infinite cache for queries? Really? … Yes :)
A Query on the Datastore is always associated with an Entity Kind. This means that if we have a way to keep a reference to all the queries we have made for each Entity Kind, we can then invalidate their cache only when an entity of the same Kind is added/updated or deleted.
And that’s exactly what nsql-cache is doing when a Redis client is provided. Each time a Datastore Query successfully resolves, 3 operations take place:
- Generate a unique cache key for the Query
- Save the response from the Query in Redis at this cache key
- In a parallel operation, save the cache key into a Redis Set
The next time we add, update or delete an entity, nsql-cache will:
- Read the Redis Set members (cache keys) for this entity Kind
- Delete all the cache keys (and thus invalidate the queries cache)
Depending on the size of your application, keeping an infinite cache for the queries might be too much for you (yes it can get very big!). Let’s see how to set a different Time To Live for Keys and Queries.
As you see, you just need to provide a duration in seconds for each type of cache, and Redis will automatically delete the expired cache.
Note: the TTL duration defined here in the configuration can be overridden on any request later on.
Those paying attention have probably noticed that the stores setting is an Array. This is because nsql-cache uses the great cache-manager library under the hood that lets you define multiple cache stores with different TTL values in each one.
This allows you, for example, to have one extremely fast memory cache for your most accessed entities/queries (with a short TTL), and a second Redis cache for longer TTLs (also extremely fast but some latency for the network i/o cannot be avoided).
Let’s see how we would set up 2 cache stores.
And to change the default TTL values for each store, provide a configuration object in the ttl config by store name.
As we have seen, nsql-cache automatically keeps a reference to all the queries of each Entity Kind (if a Redis client has been provided). In some cases, you might want to aggregate multiple queries and save them as one key/value. nsql-cache has a method for that:
Let’s see an example where we make multiple queries to fetch the data for the Home page of a website.
The advanced, “manual” way
If you don’t want so much magic and prefer to manage the cache yourself, you can disable the wrapping of the @google-cloud/datastore client and use the NsqlCache API.
You are now in charge to manage the cache. You can use yet another layer of abstraction with the
Or you can go 100% manual… (you really want that?)
If you are not wrapping the datastore client, read the Nsql API documentation and look at the examples in nsql-cache-datastore repository.
That’s it. I hope that with this post I have shown you how easy it is to add a cache layer for your Google Datastore entities. I hope that in a future post I will be able to come with some benchmarks (if someone can point me to a good tool/service to do them I’d appreciate it!).
Please leave me your comments about the approach I’ve taken in the Medium article, and if you see any improvement that could be done, let me know!
Thanks for reading!