Archive for the ‘Tips and Tricks’ Category
When saving entities to App Engine’s datastore at a high write rate, avoid monotonically increasing values such as timestamps. Generally speaking, you don’t have to worry about this sort of thing until your application hits 100s of queries per second. Once you’re in that ballpark, you may want to examine potential hotspots in your application that can increase datastore latency.
To explain why this is, let’s examine what happens to the underlying Bigtable of an application with a high write rate. When a Bigtable tablet, a contiguous unit of storage, experiences a high write rate, the tablet will have to “split” into more than one tablet. This “split” allows new writes to shard. Here’s a visual approximation of what happens:
Remember that for indexed values, we must write corresponding index rows. When values are randomly or even semi-randomly distributed, like, say, user email addresses, tablet splits function well. This is because the work to write multiple values is distributed amongst several Bigtable tablets:
The problems appear when we start saving monotonically increasing values like timestamps, or insert dictionary words in alphabetical order:
The new writes aren’t evenly distributed, and whichever tablet they end up going to end up becoming a new hot tablet in need of a split.
As a developer, what can you do to avoid this situation?
- Avoid indexes unless you need to query against the values. No index = no hot tablet on increasing value
- Lower your write rate, or figure out how to better distribute values. A pure random distribution is best, but even a distribution that isn’t random will be better than a predictable, monotonically increasing value
- Prefix a shard identifier to your value. This is problematic if you plan on doing queries, as you will need to prefix and unprefix the values, then join the results in memory – but it will reduce the error rate of your writes
The tips are applicable whether you are on Master-Slave or High Replication datastore. And one more tip: don’t prematurely optimize for this case, since chances are, you won’t run into it. You can be spending that time working on features.
P.S. Yes, I drew those doodles. No, I do not have any formal art training (how could you tell?!)
(This’ll be a shorter post than usual.)
Waiting for indexes to build can be drag; indexes need to be built before Entities even exist and can take longer than needed if the global index building workflow is backed up since mass building is a shared resource.
One little known trick is to pre-build indexes before your application needs them by deploying a non-default version. Your application can have many versions. In Java App Engine, this is defined in the version tag of appengine-web.xml. In Python, this is defined in the version YAML element. The Java Eclipse plugin even has a screen where the version can be set (Click the App Engine icon, then “Project Settings”:
Because all applications share the same datastore, the required indexes will be built once your push your application containing the indexes configuration file with the new, required indexes. Hopefully, by the time you are ready to push your real version, the indexes will have completed building.
In general, it is a best practice to maintain a staging version of your application for testing against live data. App Engine makes this so easy it’s trivial: deploy code tagged with a new “version”. Your application is accessible at http://VERSION.latest.APPID.appspot.com (note that VERSION is a String, not a integer or decimal number) – this is a handy and powerful trick to validating a new test or staging version. When you have enough confidence in your application, browse to the Admin Console, click the radio button associated with your new version, and click “Make Default.”
Versioning has never been so easy. No configuring load balancers, rolling deploys, symlinking, restarting edge caches, etc.