Ikai Lan says

I say things!

Archive for the ‘Software Development’ Category

Debugging your Google OAuth 2.0 token when you get HTTP 401s or 403s

with one comment

One of the things I get asked about the most is OAuth 2.0 when developers start seeing 401s, 403s, and possibly other HTTP 4xx status codes. This post isn’t meant to be a comprehensive guide on OAuth debugging for Google/YouTube APIs. Rather, it’s a collection of some of the steps I find myself recommending or repeating when I’m trying to debug issues with OAuth authorization.

Enabling the APIs

I’ve recorded a short video describing how to enable Google API access for use with web and installed apps:

The checklist is:

  1. Did you create a new API project?
  2. Did you enable the APIs you are looking to use?
  3. Did you create a client ID and client secret?

A common step for developers is that they forget step #2 – enabling the APIs.

Getting a token

Here are my cliff notes on tokens:

  • Access tokens – used to make API calls. These expire after an hour
  • Refresh tokens – you only get these if you request offline access when you ask a user to authorize. These are exchanged by your client for access tokens. Refresh tokens generally don’t expire.

What causes 401s and other 4xx status codes?

The common causes for “401 Unauthorized” when making API calls with an access token are:

  • expired access token (most common)
  • Developer accidentally disabled the APIs (uncommon)
  • User revokes token (rare)

Sometimes, more explanation exists in the response body of a HTTP 4xx. In the Java client, for example, you should log the error, because it will assist in troubleshooting:

try {   
       // Make your Google API call
} catch (GoogleJsonResponseException e) {
      GoogleJsonError error = e.getDetails();
      // Print out the message and errors

Different versions will have different API signatures. Here’s a link to the current version’s Javadocs (1.1.15) – newer versions look like they might be deviating from this a bit.

Troubleshooting the token

Keep this tool in your toolbox: the tokenInfo API call. Take your access token and plug it in to the end of this URL:

You could take your existing code and make an API call here whenever you get a HTTP 4xx and log that response. This’ll return some useful information:

  • When the token expires
  • What’s the token’s scope (this is important)
  • If the token is invalid

If you didn’t write the code yourself and inherited it:

  • Whether this access token came from an offline refresh_token or not (the “offline” field)

If the token is invalid … well, that doesn’t help a lot. I would troubleshoot like this:

  1. Remove the access token from your datastore or database.
  2. Use the refresh token to acquire a new access token (if you are using a refresh token)
  3. Try to make the API call again. If it works, you’re good! If not …
  4. Check the access token against the tokenInfo API
  5. If it’s still invalid, do a full reauth

Hope this helps!

Written by Ikai Lan

July 19, 2013 at 12:02 pm

Posted in Software Development

Tagged with ,

Why “The Real Reason Silicon Valley Coders Write Bad Software” is wrong

with 12 comments

There was an article in The Atlantic this morning titled, “The Real Reason Silicon Valley Coders Write Bad Software” with the tagline, “If someone had taught all those engineers how to string together a proper sentence, Windows Vista would be a lot less buggy.” The author, Bernard Meisler, seems to think that the cause of “bad software” is, and I quote:

“But the downfall of many programmers, not just self-taught ones, is their lack of ability to sustain complex thought and their inability to communicate such thoughts. That leads to suboptimal code, foisting upon the world mediocre (at best) software like Windows Vista, Adobe Flash, or Microsoft Word.”

Not only are the conclusions of the article inaccurate, it paints a negative portrayal of software engineers that isn’t grounded in reality. For starters, there is a distinction between “bad” software and “buggy” software. Software that is bad tends to be a result of poor usability design. Buggy software, on the other hand, is a consequence of a variety of factors stemming from the complexity of modern software. The largest factor in reducing the number of bugs isn’t going to come from improving the skills of individual programmers, but rather, from instituting quality control processes throughout the software engineering lifecycle.

Bad Software

Bad software is software that, for whatever reason, does not meet the expectation of users. Software is supposed to make our lives simpler by crunching data faster, or automating repetitive tasks. Great software is beautiful, simple, and, given some input from a user, produces correct output using a reasonable amount of resources. When we say software is bad, we mean any combination of things: it’s not easy to use. It gives us the wrong output. It uses resources poorly. It doesn’t always run. It doesn’t do the right thing. Bugs contribute to a poor user experience, but are not the sole culprit for the negative experiences that users have with software. Let’s take one of the examples Meisler has cited: Windows Vista. A quick search for “why does windows vista suck” in Google turns up these pages:


Oh, bugs are on there, and they’re pretty serious. We’ll get to that. But what else makes Windows Vista suck, according to those sites? Overly aggressive security prompts. Overly complex menus (we’ll visit the idea of complexity again later, I promise). None of the menus make sense. Changed network interface. Widgets that are too small to be usable shipping with the system. Rearranged menus. Search that only works on 3 folders. Those last few things aren’t bugs, they’re usability problems. The software underneath is, for the most part, what we software engineers call working as intended. Some specification somewhere designed those features to work that way, and the job of the software engineers is, in many companies, to build the software to that specification, as ridiculous as the specification is. One of my coworkers points out that Alan Cooper, creator of Visual Basic, wrote a great book about this subject titled, “The Inmates are Running The Asylum”. Interestingly enough, his argument is that overly technical people are harmful when they design user interactions, and this results in panels like the Windows Vista search box with fifty different options. But, to be fair, even when user interactions are designed well, just making the software do what the user expects is hard. A simple button might be hiding lots of complex interactions underneath the hood to make the software easy to use. The hilarious and often insightful Steve Yegge, talks about just that in a tangentially related post about complexity in software requirements. Software is generally called “bad” when it does not do what the user expects, and this is something that is really hard to get right.

Buggy Software

Buggy software, on the other hand, is software that does not behave as the software engineer expects or the specification dictates. This is in stark contrast to bad software, which is software that does not behave as the way a user expects. There’s often overlap. A trivial example: let’s suppose an engineer writes a tip calculator for mobile phones that allows a user to enter a dollar amount, and press a “calculate” button, which then causes the application to output 15% of the original bill amount on the screen. Let’s say a user uses the application, enters $100, and presses calculate. The amount that comes out is $1500. That’s not 15%! Or – the user presses calculate, and the application crashes. The user expects $15, but gets $1500. Or no result, because the application ends before it presents output.

Software is buggy partially because of bad documentation, as Meisler asserts, but not primarily because of it. Software isn’t even buggy because programmers can’t express “complex thoughts”, another of Meisler’s gems; all of programming is the ability to “combine simple ideas into compound ideas”. Software is buggy because of problems stemming out of complexity.

All software is built on top of abstractions. That is, someone else is responsible for abstracting away the details such that a programmer does not need to fully understand another system to be able to use it. As a programmer, I do not need to understand how my operating system communicates with the hard drive to save a file, or how my hard disk manages its read/write heads. I don’t need to write code that says, “move the write head” – I write code that says, “write this data to a file to this directory”.  Or maybe I just say, “just save this data” and never worry about files or directories. Abstractions in software are kind of like the organizational structure of a large company. The CEO of a large car manufacturer talks to the executive board about the general direction of the company. The executive staff then break these tasks down into more specific focus area goals for their directors, who then break these tasks into divisional goals for the managers, who then break these tasks into team goals and tasks for the individual employees that actually build, design, test, market, and sell the damn things. To make this more convoluted, it’s not all from top to bottom communication, either. There are plenty of cross team interactions, and interactions between layers of management that cross the chain of command.

To say that poor documentation is the primary source of bugs is laughable. Documentation exists to try to make sense of the complexity, but there is no way documentation can be comprehensible in any reasonably complex software with layers of abstraction, because, as Joel Spolsky, founder of Fog Creek Software says, abstractions leak. Programmers cannot know all the different ways abstractions they are depending on will fail, and thus, they cannot possibly report or handle all the different ways the abstraction they are working on will fail. More importantly: programmers cannot know how every possible combination of abstractions they are depending on will produce subtly incorrect results that result in more and more warped results up the abstraction stack. It’s like the butterfly effect.  By the time a bug surfaces, a programmer needs to chase it all the way down the rabbit hole, often into code he does not understand.  Documentation helps, but no programmer reads documentation all the way down to the bottom of the stack before he writes code. It’s not commercially feasible for programmers to do this and retain all the information a priori. Non-trivial software is complex as hell underneath the hood, and it doesn’t help that even seemingly simple software often has to turn water into wine just to try to do what a user expects.

Software engineers and critical thinking

I don’t deny the importance of writing or critical thinking skills. They are crucial. I wouldn’t be surprised if the same ability to reason through complex thoughts allows people to write well as well as program well. But to assert that writing skills lead to reasoning skills? This is a case of placing the cart before the horse. Meisler is dismissive of the intellectual dexterity needed to write programs:

“Most programmers are self-taught and meet the minimum requirement for writing code — the ability to count to eight”

It’s not true. Programming often involves visualizing very abstract data structures, multivariate inputs/outputs, dealing with non-deterministic behavior, and simulating the concurrent interactions between several moving parts in your mind. When I am programming, holding in my mental buffer the state of several objects and context switching several times a second to try to understand how a small change I make in one place will ripple outwards. I do this hundreds of times a session. It’s a near trance-like state that takes me some time to get into before I am working at full speed, and why programming is so damned hard. It’s why I can’t be interrupted and need a contiguous block of time to be fully effective on what Paul Graham calls the maker’s schedule. I’m not the only one who feels this way – many other programmers report experiencing the mental state that psychologists refer to as “flow” when they are performing at their best. 

How to reduce the incidences of bugs

Phillipe Beaudoin, a coworker, writes:

I like to express the inherent complexity of deep software stacks with an analogy, saying that software today is more like biology than mathematics. Debugging a piece of software is more like an episode of House than a clip from A Beautiful Mind. Building great software is about having both good bug prevention processes (code reviews, tests, documentation, etc.) as well as good bug correction processes (monitoring practices, debugging tools).

Trying to find a single underlying cause to buggy software is as preposterous as saying there is a single medical practice that would solve all of earth’s health problems.

Well said.

I’m disappointed in the linkbait title, oversimplification, and broad sweeping generalizations of Bernard Meisler’s article. I’m disappointed that this is how software engineering is being represented to a mainstream, non-techie audience. It’s ironic that the article totes writing skills, but is poorly structured in arguing a point. It seems to conclude that writing skills are the reason code is buggy. No wait – critical thinking. Ah! Nope, surprise, writing skills, and a Steve Jobs quote that is used in a misleading way and taken out of context mixed in for good measure. He argues for the logic of language, but as many of us who also write for fun and profit know, human language is fraught with ambiguity and there’s a lot less similarity between prose and computer programming languages than Meisler would have the mainstream audience believe. I’m sorry, Herr Meisler, but if your article were a computer program, it simply wouldn’t compile.

— Ikai

Written with special thanks to Philippe Beaudoin, Marvin Gouw, Alejandro Crosa, and Tansy Woan.

Written by Ikai Lan

October 9, 2012 at 11:14 pm

Clearing up some things about LinkedIn mobile’s move from Rails to node.js

with 13 comments

There’s an article on highscalability that’s talking about the move from Rails to node.js (for completeness: its sister discussion on Hacker News). It’s not the first time this information has been posted. I’ve kind of ignored it for now (because I didn’t want to be this guy), but it’s come up enough times and no one has spoken up, so I suppose it’s up to me to clear a few things up.

I was on the team at LinkedIn that was responsible for the mobile server, and while I wasn’t the primary contributor to that stack, I built and contributed several things, such as the unfortunate LinkedIn WebOS app which made use of the mobile server (and a few features) and much of the initial research behind productionizing JRuby for web applications (I did much more stuff that wasn’t published). I left LinkedIn in 2009, so I apologize if any new information has surfaced. My hunch is that even if I’m off, I’m not off by that much.

Basically: the article is leaving out several facts. We can all learn something from the mobile server and software engineering if we know the full story behind the whole thing.

In 2008, I joined a software engineering team that LinkedIn that was focused on building things outside the standard Java stack. You see, back then, to develop code for linkedin.com, you needed a Mac Pro with 6gigs of RAM just to run your code. And those requirements kept growing. If my calculations are correct, the standard setup for engineers now is a machine with 20 or more gigabytes of RAM just to RUN the software. In addition, each team could only release once every 6 weeks (this has been fixed in the last few years). It was deemed that we needed to build out a platform off the then-fledgling API and start creating teams to get get off the 6 week release cycle so we could iterate quickly on new features. The team I was on, LED, was created for this purpose.

Our first projects was a rotating globe that showed off new members joining LinkedIn. It used to run Poly9, but when they got shut down, it looks like someone migrated it to use Google Earth. The second major project was m.linkedin.com, a mobile web client for LinkedIn that would be one of the major clients of our fledgling API server, codenamed PAL. Given that we were building out an API for third parties, we figured that we could eat our own dogfood and build out LinkedIn for mobile phones with browsers. This is 2008, mind you. The iPhone just came out, and it was a very Blackberry world.

The stack we chose was Ruby on Rails 1.2, and the deployment technology was Mongrel. Remember, this is 2008. Mongrel was cutting edge Ruby technology. Phusion Passenger wasn’t released yet (more on this later), and Mongrel was light-years ahead of FastCGI. The problem with Mongrel? It’s single-threaded. It was deemed that the cost of shipping fast was more important than CPU efficiency, a choice I agreed with. We were one of the first products at LinkedIn to do i18n (well, we only did translations) via gettext. We deployed using Capistrano, and were the first ones to use nginx. We did a lot of other cool stuff, like experiment with Redis, learn a lot about memcached in production (nowadays this is a given, but there was a lot of memcached vs EHCache talk back then). Etc, etc. But I’m not trying to talk about how much fun I had on that team. Well, not primarily.

I’m here to clear up facts about the post about moving to node.js. And to do that, I’m going to back to my story.

The iPhone SDK had shipped around that time. We didn’t have an app ready for launch, but we wanted to build one, so our team did, and we inadvertantly became the mobile team. So suddenly, we decided that this array of Rails server that made API calls to PAL (which was, back then, using a pre-OAuth token exchange technology that was strikingly similar) would also be the primary API server for the iPhone client and any other rich mobile client we’d end up building, this thing that was basically using Rails rhtml templates. We upgraded to Rails 2.x+ so we could have the respond_to directive for different outputs. Why didn’t we connect the iPhone client directly to PAL? I don’t remember. Oh, and we also decided to use OAuth for authenticating the iPhone client. Three legged OAuth, so we also turned those Rails servers into OAuth providers. Why did we use 3-legged OAuth? Simple: we had no idea what we were doing. I’LL ADMIT IT.

Did I mention that we hosted outside the main data centers? This is what Joyent talks about when they say they supplied LinkedIn with hosting. They never hosted linkedin.com proper on Joyent, but we had a long provisioning process for getting servers in the primary data center, and there were these insane rules about no scripting languages in production, so we decided it was easier to adopt an outside provider when we needed more capacity.

Here’s what you were accessing if you were using the LinkedIn iPhone client:

iPhone -> m.linkedin.com (running on Rails) -> LinkedIn’s API (which, for all intents and purposes, only had one client, us)

That’s a cross data center request, guys. Running on single-threaded Rails servers (every request blocked the entire process), running Mongrel, leaking memory like a sieve (this was mostly the fault of gettext). The Rails server did some stuff, like translations, and transformation of XML to JSON, and we tested out some new mobile-only features on it, but beyond that it didn’t do a lot. It was a little more than a proxy. A proxy with a maximum concurrency factor dependent on how many single-threaded Mongrel servers we were running. The Mongrel(s), we affectionately referred to them, often bloated up to 300mb of RAM each, so we couldn’t run many of them.

At this time, I was busy productionizing JRuby. JRuby, you see was taking full advantage of Rails’ ability to serve concurrent requests using JVM concurrency. In addition, JRuby outperformed MRI in almost every real benchmark I threw at it – there were maybe 1 or 2 specific benchmarks when it didn’t. I knew that if we ported the mobile server to JRuby, we could have gotten more performance and gotten way more concurrency. We would have kept the same ability to deploy fast with the option to in-line into many of the Java libraries LinkedIn was using.

But we didn’t. Instead, the engineering manager at the time ruled in favor of Phusion Passenger, which, to be fair, was an easier port than JRuby. We had come to depend on various native extensions, gettext being the key one, and we didn’t have time to port the translations to something that was JRuby friendly. I was furious, of course, because I had been evangelizing JRuby as the best Ruby production environment and no one was listening, but that’s a different story for a different time. Well, maybe some people listened; those Square guys come to mind.

This was about the time I left LinkedIn. As far as I know, they didn’t build a ton more features. Someone told me that one of my old teammates suddenly became fascinated with node.js, and pretty much singlehandedly decided to rewrite the mobile server using node. Node was definitely a better fit for what we were doing, since we were constantly blocking on a cross data center call, and non blocking server for IO has been shown to be highly advantageous from a performance perspective. Not to mention: we never intended for the original Ruby on Rails server to be used as a proxy for several years.

So, knowing all the facts, what are all the takeaways?

  • Is v8 faster than MRI? MRI is generally slower than YARV (Ruby 1.9), and, at least in these benchmarks, I don’t think there is any question that v8 is freakin’ fast. If node.js blocked on I/O, however, this fact would have been almost completely irrelevant.
  • The rewrite factor. How many of us have been on a software engineering project where the end result looking nothing like what we planned to build in the first place? And, knowing fully the requirements, we know that, if given time and the opportunity to rebuild it from scratch, it would have been way better? Not to mention: I grew a lot at LinkedIn as a software engineer, so the same me several years later would have done a far better job than the same me in 2008. Experience does matter.
  • I see that one of the advantages of the mobile server being in node.js is people could “leverage” (LinkedIn loves that word) their Javascript skills. Well, LinkedIn had/has hundreds of Java engineers! If that was a concern, we would have spent more time exploring Netty. Lies, damn lies, and benchmarks, I always say, but I think it’s safe for us to say that Netty (this is vertx, which sits on top of Netty) is at least as fast as node.js for web serving.
  • Firefighting? That was probably a combination of several things: the fact that we were running MRI and leaked memory, or the fact that the ops team was 30% of a single guy.

What I’m saying here is use your brain. Don’t read the High Scalability post and assume that you must build your next technology using node.js. It was definitely a better fit than Ruby on Rails for what the mobile server ended up doing, but it is not a performance panacea. You’re comparing a lower level server to a full stack web framework.

That’s all for tonight, folks, and thank you internet for finally goading me out of hiding again.

– Ikai

Written by Ikai Lan

October 4, 2012 at 6:34 pm

Apps Script quick tips: building a stock price spreadsheet

with 4 comments

I’ve been using iGoogle less and less over the past few years. A few weeks ago, the team announced that iGoogle would be shutting down in November 2013. It’s not a huge loss to me, though I do check iGoogle several times a day. Why? Stock prices! I’ve been using the Stock Market gadget for years.

As it turns out, the functionality I want is very easy to replicate using Google Spreadsheets and Google Apps Script. I’m thoroughly convinced that the fastest way to wire up different Google services for custom functionality is this product. Google Apps Script provides services to access Google Finance APIs.

Knowing this, it’s incredibly easy to wire up a spreadsheet that has access to live finance data. The spreadsheet I use looks something like this:


We can pull this off in a few very easy steps.

Step 1: Create a spreadsheet.

I made a spreadsheet with the following columns names:

Symbol Price Change Change % Details


My intended use is to populate the Symbol column and have the rest of the data in the other columns auto populated. The nice thing about writing scripts that integrate with spreadsheets is that we have a built in UI for making edits, sorting, filtering and searching. By using spreadsheets as our data entry and manipulation UI, our functionality is already more advanced than the functionality provided in the Stock Market gadget as well as many other online portfolio-at-a-glance services.

Step 2: Create the script

What we’re going to do is write a few functions in the Script Editor. Spreadsheet cells can accept both the standard set of built-in functions that do simple things like SUM, AVG, and so forth, but they can also accept custom functions that retrieve data from other Google services.

Click Tools -> Script Editor.


This will open up a new tab in your browser where you can write code. The default name of this file is Code.gs. Replace whatever is in the buffer with this:

function getStockPrice(symbol) {
  return FinanceApp.getStockInfo(symbol)["price"];
function getStockPriceChangePct(symbol) {
  return FinanceApp.getStockInfo(symbol)["changepct"];

function getStockPriceChange(symbol) {
  return FinanceApp.getStockInfo(symbol)["change"];

function getGoogleFinanceLink(symbol) {
  return "http://www.google.com/finance?q=" + symbol;

Your Script Editor should look like this:


FinanceApp.getStockInfo() returns a FinanceResult instance with a LOT of data. I only care about the basics: price, price change, and price change percentage. The functions I’ve defined reflect this.

Step 3: Add the functions into the cells

Now let’s go back to the spreadsheet tab. I’ve populated a few basic symbols under the Symbol column: GOOG (Google) and AAPL (Apple), two of my favorite companies. In column B2, enter this value:


Hit enter. If everything is working correctly, this will now populate with the latest price of whatever stock symbol is in A2. Let’s add the rest of the functions. In C2, enter:




I like to have a link back to Google Finance if I ever want to do more research on a company, so in E2, add:


This next part is hard to explain but shouldn’t be difficult for anyone who has used a spreadsheet program before. Highlight rows B2-E2. You can hold down shift and select these rows. Now hover your mouse over the bottom right corner of E2 and drag down a few rows. What this does is it copies the functions for the subsequent rows, but it substitutes A2 for A3, A4, A5, … depending on what row you happen to be in. You can test this out by adding additional stock symbols. The live stock data will appear.

Step 4: Color Coding

I like to see color coding depending on whether a stock price has risen or fallen. Hold down shift and click on C, then D at the top of the rows:


Click the arrow to the right. This should drop down a menu. Click on “Conditional Formatting”:


You’ll want to add two rules: a greater than rule and a less than rule. When the Change and Change % columns are greater than 0, change the background to green. When they are less than 0, change the backgrounds to red. Click “Save Rules”

You’re done!


I’ve only scratched the surface of what can be done with Apps Script. We haven’t even gotten into a lot of the other cool things we can do. Using Clock Events, we can check every few minutes for changes and email ourselves using the GmailApp library if a stock price change is greater than some threshhold. We can generate charts based on historic data. And so on, and so forth. For more examples of things that can be done with Google Apps Script, check out the tutorials section for more ideas.

Have a great weekend!

– Ikai


Written by Ikai Lan

July 27, 2012 at 2:05 pm

Getting started with jOOQ: A Tutorial

with 10 comments


I accidentally stumbled onto jOOQ a few days ago while doing a lot of research on Hibernate. Funny how things work, isn’t it? For those of you that aren’t familiar with it, jOOQ is a different approach to the over-ORMing of Java persistence. Rather than try to map database tables to Java classes and abstract away the SQL underneath, jOOQ assumes you want low level control over the SQL queries you execute, and provides a mostly typesafe interface for executing queries. I don’t have anything against simple ORMs, but it’s good to have the right tool for the right job. From the jOOQ homepage:</p?

Instead of this SQL query:


You would execute this Java code:


Why a Java interface? Type safety, for one. Programmatically using jOOQ’s DSL has some advantages over writing SQL queries by hand, such as IDE support and compile time checking of some things.

The idea interested me and I dug in. Unfortunately, the jOOQ site’s documentation, while fairly comprehensive, DO NOT PROVIDE AN END TO END “GETTING STARTED” PAGE!!! This means that if you want to learn jOOQ, you’ll have to jump to the chapter about Meta model code generation, then jump to the DSL, then jump to jOOQ classes section. It’s a bit of a mess for new users. Google search also didn’t turn up many useful results, so I figured I’d whip up a quick “Getting started” guide. We’re going to go over the following steps:

Preparation: Download jOOQ and your SQL driver
Step 1: Create a SQL database and a table
Step 2: Generate classes
Step 3. Write a main class and establish MySQL connection
Step 4: Write a query using jOOQ’s DSL
Step 5: Iterate over results
Step 6: Profit!

Ready? Let’s get started.

Getting our hands dirty

Preparation: Download jOOQ and your SQL driver

If you haven’t already downloaded them, download jOOQ:


For this example, we’ll be using MySQL. If you haven’t already downloaded MySQL Connector/J, download it here:


Stash these somewhere where you can get to them later.

Step 1: Create a SQL database and a table

We’re going to create a database called “guestbook” and a corresponding “posts” table. Connect to MySQL via your command line client and type the following:

create database guestbook;

CREATE TABLE `posts` (
  `id` bigint(20) NOT NULL,
  `body` varchar(255) DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)

(I copied and pasted the create table statement from a “show create table” command)

Step 2: Generate classes

In this step, we’re going to use jOOQ’s command line tools to generate classes that map to the Posts table we just created. The official docs are here.

I’m going to augment the command line steps a bit. The easiest way to generate a schema is to copy the jOOQ jar files (there should be 3) and the MySQL Connector jar file to a temporary directory. Create a properties file. I’ve created a file called guestbook.properties that looks like this:

#Configure the database connection here

#The default code generator. You can override this one, to generate your own code style
#Defaults to org.jooq.util.DefaultGenerator

#The database type. The format here is:

#All elements that are generated from your schema (several Java regular expressions, separated by comma)
#Watch out for case-sensitivity. Depending on your database, this might be important!

#All elements that are excluded from your schema (several Java regular expressions, separated by comma). Excludes match before includes

#Primary key / foreign key relations should be generated and used. 
#This will be a prerequisite for various advanced features
#Defaults to false

#Generate deprecated code for backwards compatibility 
#Defaults to true

#The destination package of your generated classes (within the destination directory)

#The destination directory of your generated classes

One thing that wasn’t clear from jOOQ’s docs is the value of jdbc.Schema: it should be your database name. Since our database name is “guestbook”, that’s what we put. Replace the username with whatever user has the appropriate privileges: in my local dev database, my user has what is effectively root access to everything without a password. You’ll want to look at the other values and replace as necessary. Here are the two interesting properties:

generator.target.package – set this to the parent package you want to create for the generated classes. My setting of test.generated will cause the test.generated.Posts and test.generated.PostsRecord to be created

generator.target.directory – the directory to output to. Worst case scenario you can just copy the files to the package.

Once you have the JAR files and guestbook.properties in your temp directory, type this:

java -classpath jooq-1.6.8.jar:jooq-meta-1.6.8.jar:jooq-codegen-1.6.8.jar:mysql-connector-java-5.1.18-bin.jar:. org.jooq.util.GenerationTool /jooq.properties

Note the prefix slash before jooq.properies. Even though it’s in our working directory, we need to prepend a slash.

Replace the filenames with your filenames. In this example, I’m using jOOQ 1.6.8. If everything has worked, you should see this in your console output:

Nov 1, 2011 7:25:06 PM org.jooq.impl.JooqLogger info
INFO: Initialising properties  : /jooq.properties
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Database parameters      
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ----------------------------------------------------------
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   dialect                : MYSQL
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   schema                 : guestbook
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   target dir             : /Users/ikai/Documents/workspace/MySQLTest/src
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   target package         : test.generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ----------------------------------------------------------
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Emptying                 : /Users/ikai/workspace/MySQLTest/src/test/generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating classes in    : /Users/ikai/workspace/MySQLTest/src/test/generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating schema        : Guestbook.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating factory       : GuestbookFactory.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Schema generated         : Total: 122.18ms
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Sequences fetched        : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Masterdata tables fetched: 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Tables fetched           : 5 (5 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating tables        : /Users/ikai/workspace/MySQLTest/src/test/generated/tables
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ARRAYs fetched           : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Enums fetched            : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: UDTs fetched             : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating table         : Posts.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Tables generated         : Total: 680.464ms, +558.284ms
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating Keys          : /Users/ikai/workspace/MySQLTest/src/test/generated/tables
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Keys generated           : Total: 718.621ms, +38.157ms
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Generating records       : /Users/ikai/workspace/MySQLTest/src/test/generated/tables/records
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Generating record        : PostsRecord.java
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Table records generated  : Total: 782.545ms, +63.924ms
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Routines fetched         : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Packages fetched         : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: GENERATION FINISHED!     : Total: 791.688ms, +9.143ms

Step 3. Write a main class and establish MySQL connection

Let’s just write a vanilla main class in the project containing the generated classes:

public class Main {

	public static void main(String[] args) {
		Connection conn = null;
		String userName = "ikai";
		String password = "";
		String url = "jdbc:mysql://localhost:3306/guestbook";
		try {
			conn = DriverManager.getConnection(url, userName, password);
		} catch (Exception e) {
			// You'll probably want to handle the exceptions in a real app
			// Don't ever do this silence catch(Exception e) thing. I've seen this in
			// live code and it is horrendous.


This is pretty standard code for establishing a MySQL connection.

Step 4: Write a query using jOOQ’s DSL

Let’s add a simple query:

			GuestbookFactory create = new GuestbookFactory(conn);
			Result result = create.select().from(Posts.POSTS).fetch();

We need to first get an instance of GuestbookFactory so we can write a simple SELECT query. We pass an instance of the MySQL connection to GuestbookFactory. Note that factory doesn’t close the connection. We’ll have to do that ourselves.

We then use jOOQ’s DSL to return an instance of Result. We’ll be using this result in the next step.

Step 5: Iterate over results

After the line where we retrieve the results, let’s iterate over the results and print out the data:

			for (Record r : result) {
				Long id = r.getValueAsLong(Posts.ID);
				String title = r.getValueAsString(Posts.TITLE);
				String description = r.getValueAsString(Posts.BODY);
				System.out.println("ID: " + id + " title: " + title + " desciption: " + description);

The full program should now look like this:

package test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

import org.jooq.Record;
import org.jooq.Result;

import test.generated.GuestbookFactory;
import test.generated.tables.Posts;

public class Main {

	 * @param args
	public static void main(String[] args) {
		Connection conn = null;
		String userName = "ikai";
		String password = "";
		String url = "jdbc:mysql://localhost:3306/guestbook";
		try {
			conn = DriverManager.getConnection(url, userName, password);

			GuestbookFactory create = new GuestbookFactory(conn);
			Result result = create.select().from(Posts.POSTS).fetch();
			for (Record r : result) {
				Long id = r.getValueAsLong(Posts.ID);
				String title = r.getValueAsString(Posts.TITLE);
				String description = r.getValueAsString(Posts.BODY);
				System.out.println("ID: " + id + " title: " + title + " desciption: " + description);
		} catch (Exception e) {
			// You'll probably want to handle the exceptions in a real app
			// Don't ever do this silence catch(Exception e) thing. I've seen this in 
			// live code and it is horrendous.

Step 6: Profit!

Get a job and go to work like the rest of us.


I haven’t explored the more advanced bits of jOOQ, but, at least judging from the docs, it looks like there’s a lot of meat there. I’m hoping this guide makes it easier for new users to dive in.

– ikai
Currently listening: Sweat – Snoop Dogg vs David Guetta

Written by Ikai Lan

November 1, 2011 at 6:54 pm

On Hackathons, Process, Email and the Tragedy of the Commons

with 3 comments


I love hackathons. I love going to them, and I love running them. Most recently, I participated in a 48-hour hackathon in Kuala Lumpur, Malaysia. It’s one of the best parts of my job. I get to run (and sometimes participate) in both external hackathons, as well as hackathons that are internal to Google.

In early June I held an internal Hackathon at Google to teach employees how to best use the product I work on: Google App Engine. I consider the event a success: we had hundreds of RSVPs and a completely booked room. It was so successful, in fact, that I’m planning on holding at least one of these events a quarter. The breakdown was primarily newer employees, which didn’t surprise me giving the amount of hiring we’ve been doing.

A primary driver for the sheer volume of RSVPs was the fact that we advertised the event on a mailing list that went out to pretty much all of engineering. All. Of. It. An engineering company with headcount in the tens of thousands, hundreds of RSVPs was not only likely, it was pretty much a mathematical certainty. Looking back, we would probably not have received the response if we didn’t sent out such a wide blast.

As a result of what I consider to be a fairly successful event (and I don’t mean to take all the credit here, at about the same time as my event, there was another very successful internal hackathon), various teams have suggested hackathons for their product APIs. There are events on the calendar.

Therein, of course, lies our problem. The problem of noise.

What should we do? Email all of engineering for every event? Create a new list/site/page announcing new events? Let’s break down the tradeoffs for each choice:

1. email all of engineering

Pros: goes to everyone

Cons (and this is the bigger point of this post): the majority of events will be irrelevant, causing the signal-to-noise ratio on the list to significantly drop, causing people to filter out these annoucements

2. Create a new distribution channel for events

Pros: Opt in

Cons: You don’t get the distribution you’d get with #1, since only a minority of people will opt in. Also – has the same SNR problems.

Now, a hybrid solution would be to do both. High profile, important events go to all of engineering, and smaller events go to the special distribution channel. The issue here is that everyone’s event is high profile. So again, we don’t have a great solution. Not to mention: people can only attend so many hackathons and still be able to do all the stuff they’re supposed to be doing. See, that’s one of the great things about Google engineering. If you’re consistently delivering, there isn’t a manager in the company that will tell you not to attend a hackathon or internal event where you can only get better at what you do. The issue, of course, is that the more hackathons take place, you are likely taking something a resource away from another team for a non-trivial amount of time. From a hackathon organizer’s perspective, a hackathon is almost always beneficial as long as some non-zero number of participants show up: they learn about your API, provide feedback and you learn a bit about how to improve the documentation or SDK. You almost can’t afford not to throw a hackathon.

This is the classic example of the tragedy of the commons. By running an event, you consume space. You consume employee time. You generate noise on all the distribution channels. And when everyone does it, suddenly, as a whole, everyone is worse off, though you yourself may individually gain.


Another key example of the tragedy of the commons is a company’s email marketing. I worked at a consumer internet company that broke teams by product. To drive usage metrics for an individual product, the product managers would run email campaigns to the site’s millions of users. The result was that the individual product would receive for usage, and everyone would give themselves a pat on the back. What was actually happening was that it was causing users to become extremely irritated at the company (myself included) for the voluminous amounts of email being sent all the time. Sure, you could go to the site settings and disable email, but new products would automatically opt you in to receiving notifications, and you would have to log back into the site to find the settings and disable those notifications as well. Some users, like myself, have created Gmail filters to completely send all emails from this company’s domain to a “Stupid Mail” label. I can understand the individual product managers’ reasoning. You don’t want to be the one team that doesn’t deliver metrics, so you email spam. And when everyone email spams, it’s to the detriment of the company overall. An employee posted to an internal group asking if it was an example of the tragedy of the commons – I don’t know if his advice was ever heeded, but based on the complaints I see on Twitter about email, my guess is no.


I view team processes the same way, and this sometimes leads to some very heated discussions with people I work with. It’s not that I don’t believe making your 1 step process a 5 step process doesn’t make your life easier or the company better organized; it’s that everybody wants to turn their processes from one step, lightweight, free form processes into full on, form driven, strict-requirements-based, signed-in-triplicate steps for doing things. I fight heavy processes when I can because I don’t believe enough people do so. Why? The tragedy of the commons. An extra 20 minutes here, and extra 20 minutes there, and suddenly, I am spending most of my day tangled in process instead of getting things done.

There are no easy solutions to this, of course. Some process is necessary, though from the onset, it isn’t always obvious which ones. How do you know, for instance, if a process is unnecessary? A good example is a managerial approval step in a process. Let’s say I need approval to do something. How do I evaluate if managerial approval is working?

  • What is the cost of doing it wrong? What was the bad outcome?
  • What was the number of incidences in which, prior to the institution of the process, that approval would have prevented a bad outcome?
  • Is the manager rubber stamping requests?

What absolutely needs to be done are constant evaluations of process. Don’t create a process and sit on it. Make it better. What can you take away, and still have it work? Think about your last trip to the DMV. How many steps could have been eliminated?

Awareness of the bigger picture

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

Antoine de Saint-Exupery
French writer (1900 – 1944)

I suppose that’s the solution to fighting the tragedy of the commons. A constant awareness of the bigger picture and a real desire to make things better. An understanding that many things in this world are a zero sum game. I’ll issue caution, of course: you can probably only champion a few things. Championing fixing everything, and people stop listening to you, you lose focus, and you will end up fixing very little. What do we call this effect? No, I won’t bother. Hopefully you actually read this and already know.

– Ikai

Written by Ikai Lan

July 16, 2011 at 2:24 pm

Setting up an OAuth provider on Google App Engine

with 25 comments

App Engine provides an API for easily creating an OAuth provider. In this blog post, I’ll describe the following steps:

  1. Create and deploy an App Engine application the implements the OAuth API
  2. Add a new domain to your Google Account. Verify this domain.
  3. Connecting an OAuth client to make requests against your application

I’ll avoid a deep explanation of OAuth for now. We can find everything you need to know about OAuth in the Beginner’s guide to OAuth.

Get the code

The code that goes along with this blog post is available here:


The two most important files are:

  • python/oauth_client.py
  • src/com/ikai/oauthprovider/ProtectedServlet.java

Step 1: Create and deploy an App Engine application that uses the OAuth API

Create a new App Engine Java application. I’ve created a servlet called ProtectedServlet:

package com.ikai.oauthprovider;

import com.google.appengine.api.oauth.OAuthRequestException;
import com.google.appengine.api.oauth.OAuthService;
import com.google.appengine.api.oauth.OAuthServiceFactory;
import com.google.appengine.api.users.User;

import java.io.IOException;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class ProtectedServlet extends HttpServlet {
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
	    throws IOException {
	User user = null;
	try {
	    OAuthService oauth = OAuthServiceFactory.getOAuthService();
	    user = oauth.getCurrentUser();
	    resp.getWriter().println("Authenticated: " + user.getEmail());
	} catch (OAuthRequestException e) {
	    resp.getWriter().println("Not authenticated: " + e.getMessage());

This servlet is incredibly simple. We retrieve an instance of OAuthService via OAuthServiceFactory and attempt to fetch the current user. Note that the User instance is the same kind of instance as a User returned by UserService. That’s because a User is still expected to sign in via a Google Account.

The method getCurrentUser() takes care of all of the OAuth signature verification. If something goes wrong – say, the request is not signed, or the signature is invalid, or the client’s timestamp is outside of the acceptable skew, or the nonce is repeated – OAuthService throws OAuthRequestException.

We can run this code locally, but it won’t work. When run locally, oauth.getCurrentUser() always returns a test user. Wel need to deploy it to App Engine before it’ll do verification. After deploy, we can test the servlet. I have the servlet mapped to /resource. When we browse to this URL, we see:

Not authenticated: Unknown

That’s okay. We expect to see this because we’re sending a vanilla GET to this API.

2. Add a new domain to your Google Account. Verify this domain

OAuth clients require a consumer key and consumer token. We need to generate these. Browse to the “Manage Domains” page:


It should look like this:

Add the base URL of our App Engine app into the text box in the “Add a New Domain” section and click “Add domain”. For instance, I entered: http://ikai-oauth.appspot.com.

We’ll be taken to a new page where we need to verify ownership of the application:

Download the HTML verification file and place it into our war directory. Deploy this new version of the application to App Engine. Once we have confirmed that the page is serving, click “Verify” to complete the verification process.

When we have verified our domain, we will be asked to accept the Terms of Service and enter a few settings. Only the authsub setting is required; we can enter anything we want here because we will not be using authsub. We will then be presented with an OAuth consumer key and OAuth consumer secret. The OAuth consumer key is simply the domain, whereas the consumer secret is an autogenerated shared secret that clients will be using.

Now we have these values, we can move on to step 3.

3. Connecting an OAuth client to make requests against your application

As of the time of this writing, App Engine only supports OAuth 1.0.

Below is a basic script that will do the 3-legged OAuth dance, cache access tokens locally and make API calls. To run this script, you will need to install the python-oauth2 library. If we have git installed, the commands to install the library on a *Nix like system are:

git clone https://github.com/simplegeo/python-oauth2.git
cd python-oauth2
sudo python setup.py install

This installs the oauth2 library into your Python install so you can import it when we need it.

Now we can run the script to make authenticated calls against our app. Note that we’ll want to substitute the consumer_secret and app_id values with values that map to your application ID and consumer secret:

import oauth2 as oauth
import urlparse
import os
import pickle

app_id = "your_app_id_here"
url = "http://%s.appspot.com/resource" % app_id

consumer_key = '%s.appspot.com' % app_id
consumer_secret = 'your_consumer_secret_here'

access_token_file = "token.dat"

request_token_url   = "https://%s.appspot.com/_ah/OAuthGetRequestToken" % app_id
authorize_url       = "https://%s.appspot.com/_ah/OAuthAuthorizeToken" % app_id
access_token_url    = "https://%s.appspot.com/_ah/OAuthGetAccessToken" % app_id

consumer = oauth.Consumer(consumer_key, consumer_secret)

if not os.path.exists(access_token_file):

    client = oauth.Client(consumer)

    # Step 1: Get a request token. This is a temporary token that is used for 
    # having the user authorize an access token and to sign the request to obtain 
    # said access token.

    resp, content = client.request(request_token_url, "GET")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])

    request_token = dict(urlparse.parse_qsl(content))

    print "Request Token:"
    print "    - oauth_token        = %s" % request_token['oauth_token']
    print "    - oauth_token_secret = %s" % request_token['oauth_token_secret']

    print "Go to the following link in your browser:"
    print "%s?oauth_token=%s" % (authorize_url, request_token['oauth_token'])

    # After the user has granted access to you, the consumer, the provider will
    # redirect you to whatever URL you have told them to redirect to. You can 
    # usually define this in the oauth_callback argument as well.
    accepted = 'n'
    while accepted.lower() == 'n':
            accepted = raw_input('Have you authorized me? (y/n) ')

    # Step 3: Once the consumer has redirected the user back to the oauth_callback
    # URL you can request the access token the user has approved. You use the 
    # request token to sign this request. After this is done you throw away the
    # request token and use the access token returned. You should store this 
    # access token somewhere safe, like a database, for future use.
    token = oauth.Token(request_token['oauth_token'],
    client = oauth.Client(consumer, token)

    resp, content = client.request(access_token_url, "POST")
    access_token = dict(urlparse.parse_qsl(content))

    print "Access Token:"
    print "    - oauth_token        = %s" % access_token['oauth_token']
    print "    - oauth_token_secret = %s" % access_token['oauth_token_secret']
    print "You may now access protected resources using the access tokens above." 

    token = oauth.Token(access_token['oauth_token'],

    with open(access_token_file, "w") as f:
        pickle.dump(token, f)

    with open(access_token_file, "r") as f:
        token = pickle.load(f)

client = oauth.Client(consumer, token)
resp, content = client.request(url, "GET")
print "Response Status Code: %s" % resp['status']
print "Response body: %s" % content

(The basis for this script was shamelessly stolen from Joe Stump’s sample oauth2 code for his Python library on Github.)

Once we run the script using:

python oauth_client.py

we should see:

Request Token:
- oauth_token        = SOME_OAUTH_REQUEST_TOKEN_VALUE
- oauth_token_secret = SOME_OAUTH_REQUEST_SECRET_VALUE

Go to the following link in your browser:

Have you authorized me? (y/n)

The OAuth token and token secret values are generated by the script using a combination of random values and the consumer key/secret pair. With these values, known as request tokens, you generate an authorization URL for an end user to bless our client so it can make OAuth requests on the behalf of the user that grants authorization.

At this point, the script pauses for input. As part of the OAuth dance, we need to browse to the URL provide and authorize the script. Copy/paste this URL into your browser window and click “Grant Access”:

Once we see a page that says:

You have successfully granted ikai-oauth.appspot.com access to your Google Account. You can revoke access at any time under ‘My Account’.

We can switch back to your terminal window and hit “y”. The client now exchanges our request tokens for access tokens. Access tokens are what you need to make API calls. The script outputs this:

Access Token:
- oauth_token        = SOME_OAUTH_ACCESS_TOKEN
- oauth_token_secret = SOME_OAUTH_ACCESS_TOKEN_SECRET

You may now access protected resources using the access tokens above.

Response Status Code: 200
Response body: Authenticated: the-account-you-logged-in-with@gmail.com

The Python script caches the access token in a file called token.dat, so the next time we run oauth_client.py, we skip the authorization dance and can directly make API calls:

$ python oauth_client.py
Response Status Code: 200
Response body: Authenticated:the-account-you-logged-in-with@gmail.com

That’s all there is to it!

Final notes and general tips

Setting up an OAuth provider using App Engine’s API is incredibly simple once we know all the steps. Setting up the provider is just a matter of a few lines of code, and the steps to set up the client are pretty straightforward. The most difficult part is setting up the consumer key and secret, but even that isn’t so bad once we know where the management interface is.

When possible, use OAuth instead of ClientLogin. This goes for web applications, mobile applications, desktop apps, and even command line scripts. OAuth allows users to revoke your access token and trains users not to arbitrarily give out their Google Account password to any interface that asks for it. For building clients, it also gives you a way to do client authentication without having to cache credentials – using ClientLogin too often results in CaptchaRequiredException being thrown, anyway.

– Ikai


Github sample code:

App Engine/Java OAuth docs: http://code.google.com/appengine/docs/java/oauth/overview.html

Domain management – get your consumer key/secret here: https://www.google.com/accounts/ManageDomains

Python OAuth client code: https://github.com/simplegeo/python-oauth2

Written by Ikai Lan

May 26, 2011 at 5:23 pm

Unit Testing in Tipfy, an App Engine framework in Python

with 6 comments

I’ve been playing around with the Tipfy framework for App Engine. Tipfy is a framework built on top of App Engine’s APIs that provides many features on top of what is currently possible. I won’t go too much into their virtues here.

One thing that’s bothered me is the dearth of a testing guide. More disturbing still is that one of the top search results for unit testing is a groups post of a developer bragging that he doesn’t write tests (let’s hope no one ever has to work with you). Digging around, it’s clear that Rodrigo Moraes, the creator of Tipfy, emphasizes testing in his own app, as can be evidence by the testing package in the Tipfy source repository. I’ve decided to write this quick guide to help other developers to try to save some time having to do the detective work I’ve had to do to get unit tests running.


So – if you don’t want to read, you can just skip ahead and read this code sample which shows an example of how to write tests for the demo “Hello, World” application that comes as part of the Tipfy download.

Getting Started

We’re going to need a few different tools to run tests. Note that we don’t need need them, I just find that using these tools will make our life a lot easier:

  • Nose – Nose is a popular Python test discovery and execution tool. Nose will dig through your source directory and run your tests
  • Nose GAE plugin – this is the plugin that makes nose play nice with the local App Engine SDK

If you don’t already have these tools installed, go ahead and install them with easy_install:

sudo easy_install nose
sudo easy_install nosegae

We’ll also need to make sure tipfy is on our PYTHONPATH. Look for tipfy under YOUR_TIPFY_INSTALL/app/distlib. Here’s what I see as of the writing of this post:

distlib ikai$ ls
README.txt	babel		jinja2		tipfy		werkzeug

Add this to your PYTHONPATH by adding a line to ~/.bash_profile (or equivalent on your system):

export PYTHONPATH="/path/to/root/of/tipfy/libraries

If needed, run:

source ~/.bash_profile

Alright, you’re ready to roll. Run a test from the root of your application directory. It’s probably easiest to do this from the directory app.yaml resides in:

nosetests -d --with-gae --without-sandbox -v

Note that this assumes your App Engine SDK lives at /usr/local/google_appengine. If it doesn’t, either symlink it or pass the –gae-lib-root flag.

You only really need –with-gae and –without-sandbox flags, but I like the other flags. Type nosetests –help for a full description of the commands available.

Now let’s write some tests.

Writing tests

Now let’s create a new file for tests. Tipfy has a concept of apps within a project (think Django apps), so for this example, I’ll create a file called tests.py in each app directory for each organization (we’ll have to remember to create a setting in app.yaml to not upload this file, but this isn’t crucial). The responsibility of the tests in this file will be to run the tests for the app it’s colocated with. It’d be equally valid to create a test directory.

Here’s our tests.py:

import unittest

from tipfy import RequestHandler, Tipfy
import urls

class TestHandler(unittest.TestCase):
    def setUp(self):
        self.app = Tipfy(rules=urls.get_rules(None))        
        self.client = self.app.get_test_client()

    def test_hello_world_handler(self):        
        response = self.client.get('/', follow_redirects=True)
        self.assertEquals(response.data, "Hello BLAH")
    def test_pretty_hello_world_handler(self):                
        response = self.client.get('/pretty')
        self.assertTrue("Hello, World!" in response.data)

Let’s talk through what we’re doing here step by step:

    def setUp(self):
        self.app = Tipfy(rules=urls.get_rules(None))
        self.client = self.app.get_test_client()

If you’re using to Python testing, this shouldn’t look too surprising to you. The setUp function is run before each test. We’re doing two things here:

  1. Initialize an instance of the app. We’ve imported the urls module from this app, so we can call get_rules() on it to get our URL mappings. We’re passing None to this because it expects an app, but as luck would have it, the “Hello World” demo doesn’t actually use this paramter.
  2. We’re initializing an instance of the test client. This is what we’ll be using to make requests

Now let’s talk about the tests

    def test_hello_world_handler(self):
        response = self.client.get('/', follow_redirects=True)
        self.assertEquals(response.data, "Hello BLAH")

    def test_pretty_hello_world_handler(self):
        response = self.client.get('/pretty')
        self.assertTrue("Hello, World!" in response.data)

In test_hello_world_handler(), we use self.client.get() to make a call to the”/” URL. Note that we’ve passed a follow_redirects argument; we don’t actually need this. This is just something I copied over from Rodrigo’s original testing example. We test to ensure that the response equals the output.

In our second test, we test the “pretty” version of this handler. We look for a String inside, but really it’s up to us how we want to do this. In general, we don’t want to look for an exact match of the output, since this makes our test extremely brittle and we’ll end up either not maintaining or deleting this test.

Advanced users will likely have all the handlers extend a BaseHandler RequestHandler class and call self.render(). We can point the render method to a Mock method, then try to capture the context parameters that were passed. (this is a bit out of scope for this post, but I may follow up this post with some quick samples of how to do Mocking – I like Michael Foord’s Mock library.

Writing tests with the datastore

Let’s do something a bit more interesting. Let’s run some tests with the datastore. We’ll also demonstrate some other ways of testing Tipfy. Let’s consider the following, updated code snippet:

# Install nose and nosegae:
#   sudo easy_install nose
#   sudo easy_install nosegae
# run via:
#  nosetests --with-gae --without-sandbox -v

import unittest

from tipfy import RequestHandler, Rule, Tipfy
# Need this import for testing
from google.appengine.api import apiproxy_stub_map, datastore_file_stub
from google.appengine.ext import db
import urls

class Comment(db.Model):
    body = db.StringProperty()

class TestHandler(unittest.TestCase):

    def setUp(self):
            We use this to clear the datastore. Thanks to Gaetestbed for
            his example here:

        datastore_stub = apiproxy_stub_map.apiproxy._APIProxyStubMap__stub_map['datastore_v3']

        # We're importing rules from the sample app
        # The sample app doesn't require an app
        self.app = Tipfy(rules=urls.get_rules(None))
        self.client = self.app.get_test_client()

    def test_hello_world_handler(self):
        response = self.client.get('/', follow_redirects=True)
        self.assertEquals(response.data, "Hello BLAH")

    def test_pretty_hello_world_handler(self):
        response = self.client.get('/pretty')
        self.assertTrue("Hello, World!" in response.data)

    def test_save_comment(self):
        class DatastorePostHandler(RequestHandler):
            def post(self):
                body = self.request.form.get("body")
                comment = Comment()
                comment.body = body
                return "OK"

        rules = [
            Rule('/ds', endpoint='ds', handler=DatastorePostHandler),

        app = Tipfy(rules=rules)
        client = app.get_test_client()
        response = client.post('/ds')
        self.assertEquals(response.data, "OK")
        comments = Comment.all().fetch(100)
        self.assertEquals(1, len(comments))

Revisiting the setUp() method, we see that we have a new line of code:

datastore_stub = apiproxy_stub_map.apiproxy._APIProxyStubMap__stub_map['datastore_v3']

Between test invocations, the datastore stub is NOT cleared. This lets us do it, since the last thing we want is to have state persist between tests. That’s a very bad practice I occasionally see in “clever” attempts to save lines of code. Don’t do it. It causes flaky tests and will give you hours of pain. Reset your state and rebuild it each time.

test_save_comment() defines a handler and a set of rules for our Tipfy instance. We probably won’t be doing this for non-trivial applications, since the whole point is to test some handler code we wrote, but it serves our purpose for this example. We want to test for a side effect – in this case, that a comment was saved. In a more complete test, we would not only test for the number of comments, but we’d also test that the body was saved. Notice the difference in our call to client.post() – this invokes an HTTP POST instead of an HTTP GET.

When we run nosetests with the command above, we get:

$ nosetests -d --with-gae --without-sandbox -v
test_hello_world_handler (apps.hello_world.tests.TestHandler) ... ok
test_pretty_hello_world_handler (apps.hello_world.tests.TestHandler) ... ok
test_save_comment (apps.hello_world.tests.TestHandler) ... ok

Ran 3 tests in 0.206s


And life is good again.

Final notes on testing

I’m not one of these people that believe that 100% test coverage, or even 80% test coverage is needed for a project to be well covered. The payoff for that much coverage often involves lots and lots of code is relatively minor, especially for trivial code paths.

I also see a lot of developers completely isolate each layer of the stack. In the datastore example above, these developers would have completely mocked out the datastore layer. I don’t find this to be a useful practice by default, as you end up testing your mocks and not the code. There are cases where this practice is useful, but in most cases, you will have more confidence in your code if you take the time to define a correct set of fixtures. Where you’ll 100% want mocks are places where you have complex or external services that can be flaky, or when you need to replicate failure conditions that are difficult to programmatically cause in your code.

Don’t think of testing as a replacement for QA because it’s not. In web testing, think of it as a replacement for opening a browser and clicking. When you discover a bug, you write a test for it and try to fix it, because in most cases setting up the error state will be much easier programmatically than manually. You’re always going to have to do browser testing at some point, but it’s time consuming, especially if you need your data in a specific state. You could go the Selenium route for full coverage, but in my experience (people are going to disagree with me on this – get ready for comment/Twitter trolling), Selenium tests, while providing a high level of confidence, also are extremely brittle and are a maintenance nightmare if you have too many of them. You’ll want to write as many tests as you can outside the browser environment and save Selenium for the minority of your user flows that are critical – write Javascript unit tests instead of Selenium tests for client side functionality. I’ve used JsUnit before and heard good things about Jasmine but never had experience with it myself.

And my last tip? Do what works for your team. But do write tests, because it’s one of those practices that will pay off over time if you write AND maintain them well.

– Ikai

Written by Ikai Lan

February 19, 2011 at 2:12 am

App Engine datastore tip: monotonically increasing values are bad

with 20 comments

When saving entities to App Engine’s datastore at a high write rate, avoid monotonically increasing values such as timestamps. Generally speaking, you don’t have to worry about this sort of thing until your application hits 100s of queries per second. Once you’re in that ballpark, you may want to examine potential hotspots in your application that can increase datastore latency.

To explain why this is, let’s examine what happens to the underlying Bigtable of an application with a high write rate. When a Bigtable tablet, a contiguous unit of storage, experiences a high write rate, the tablet will have to “split” into more than one tablet. This “split” allows new writes to shard. Here’s a visual approximation of what happens:

There’s a moment of pain – this is one of the causes of datastore timeouts in high write applications, as discussed in Nick Johnson‘s article, “Handling Datastore Errors“.

Remember that for indexed values, we must write corresponding index rows. When values are randomly or even semi-randomly distributed, like, say, user email addresses, tablet splits function well. This is because the work to write multiple values is distributed amongst several Bigtable tablets:

The problems appear when we start saving monotonically increasing values like timestamps, or insert dictionary words in alphabetical order:

The new writes aren’t evenly distributed, and whichever tablet they end up going to end up becoming a new hot tablet in need of a split.

As a developer, what can you do to avoid this situation?

  • Avoid indexes unless you need to query against the values. No index = no hot tablet on increasing value
  • Lower your write rate, or figure out how to better distribute values. A pure random distribution is best, but even a distribution that isn’t random will be better than a predictable, monotonically increasing value
  • Prefix a shard identifier to your value. This is problematic if you plan on doing queries, as you will need to prefix and unprefix the values, then join the results in memory – but it will reduce the error rate of your writes

The tips are applicable whether you are on Master-Slave or High Replication datastore. And one more tip: don’t prematurely optimize for this case, since chances are, you won’t run into it. You can be spending that time working on features.

– Ikai

P.S. Yes, I drew those doodles. No, I do not have any formal art training (how could you tell?!)

Written by Ikai Lan

January 25, 2011 at 6:26 pm

GWT, Blobstore, the new high performance image serving API, and cute dogs on office chairs

with 22 comments

I’ve been working on an image sharing application using GWT and App Engine to familiarize myself with the newer aspects of GWT. The project and code are here:


(Please excuse spaghetti code in client side GWT code, much of it was me feeling my way around GWT. I’ve come to appreciate GWT quite a bit in spite of the fact that I’m pretty familiar with client side development; I’ll write about this in a future post).

The 1.3.6 release of the App Engine SDK shipped with a high performance image serving API. What this means is that a developer can take a blob key pointing to image data stored in the blobstore and call getServingUrl() to create a special URL for serving the image. What are the benefits to using this API?

  • You don’t have to write your own handler for uploaded images
  • You don’t have to consume storage quota for saving resized or cropped images, as you can perform transforms on the image simply by appending URL parameters. You only need to store the final URL that is generated by getServingUrl().
  • You aren’t charged for datastore CPU for fetching the image (you will still be billed for bandwidth)
  • Images are, in general, served from edge server locations which can be geographically located closer to the user

There are a few drawbacks, however, to using the API:

  • There aren’t any great schemes for access control of the images, and if someone has the URL for a thumbnail, they can easily remove the parameters to see a larger image
  • Billing must be enabled – you will only be charged for usage, however, so you don’t have to spend a cent to use the API. You just have to have billing active.
  • Deleting an image blob doesn’t delete the image being served from the URL right away – that image will still be available for some time
  • Images must be uploaded to the blobstore, not the datastore as a blob, so it’s important to understand how the blobstore API works
  • The URLs of the created images are really, really ugly. If you need pretty URLs, it’s probably a better pattern to create a URL mapping to an HTML page that just displays the image in an IMG tag

Blobstore crash course

It’ll be best if we gave a quick refresher course on the blobstore before we begin. Here’s the standard flow for a blobstore upload:

  1. Create a new blobstore session and generate an upload URL for a form to POST to. This is done using the createUploadUrl() method of BlobstoreService. Pass a callback URL to this method. This URL is where the user will be forwarded after the upload has completed.
  2. Present an upload form to the user. The action is the URL generated in step 1. Each URL must be unique: you cannot use the same URL for multiple sessions, as this will cause an error.
  3. After the URL has uploaded the file, the user is forwarded to the callback URL in your App Engine application specified in step 1. The key of the uploaded blob, a String blob key, is passed as an URL parameter. Save this URL and pass the user to their final destination

Got it? Now we can talk about image serving.

Using the image serving URL

Once we have a blob key (step 3 of a Blobstore upload), we can do interesting things with it. First, we’ll need to create an instance of the ImagesService:

ImagesService imagesService = ImagesServiceFactory.getImagesService();

Once we have an instance, we pass the blob key to getServingUrl and get back a URL:

String imageUrl = imagesService.getServingUrl(blobKey);

This can sometimes take several hundred milliseconds to a few seconds to generate, so it’s almost always a good idea to run this on write as opposed to first read. Subsequent calls should be faster, but they may not be as fast as reading this value from a datastore entity property or memcache. Since this value doesn’t change, it’s a good idea to store it. On the local dev server, this URL looks something like this:


In production, however, this will return a URL that looks like this:


(Cute dogs below)

You’ve already saved yourself the trouble of writing a handler. What’s really nice about this URL is that you can perform operations on it just by appending parameters. Let’s say we wanted to crop our image to be no larger than 200×200, yet retain scale. We’d simply append “=s200” to the end of the image:


(Looks like this)

We can also crop the image by appending a “-c” to the size parameter:


(Looks like this – compare with above)

Note that we can also generate these URLs programmatically using the overloaded version of getServingUrl that also accepts a size and crop parameter.

Adding GWT

So now that we’ve got all that done, let’s get it working with GWT. It’s important that we understand how it all works, because GWT’s single-page, Javascript-generated content model must be taken into account. Let’s draw our upload widget. We’ll be using UiBinder:

We’ll create our Composite class as follows:

public class UploadPhoto extends Composite {

    private static UploadPhotoUiBinder uiBinder = GWT.create(UploadPhotoUiBinder.class);

    UserImageServiceAsync userImageService = GWT.create(UserImageService.class);

    interface UploadPhotoUiBinder extends UiBinder {}

    Button uploadButton;

    FormPanel uploadForm;

    FileUpload uploadField;

    public UploadPhoto(final LoginInfo loginInfo) {


Here’s the corresponding XML file:

<!DOCTYPE ui:UiBinder SYSTEM "http://dl.google.com/gwt/DTD/xhtml.ent">
<ui:UiBinder xmlns:ui="urn:ui:com.google.gwt.uibinder"
	<g:FormPanel ui:field="uploadForm">
			<g:FileUpload ui:field="uploadField"></g:FileUpload>
			<g:Button ui:field="uploadButton"></g:Button>

(We’ll add more to this later)

When we discussed the Blobstore, we mentioned that each upload form has a different POST location corresponding to the upload session. We’ll have to add a GWT-RPC component to generate and return a URL. Let’s do that now:

// UserImageService.java
public interface UserImageService extends RemoteService  {
    public String getBlobstoreUploadUrl();

Our IDE will nag us to generate the corresponding Async interface if we have a GWT plugin:

// UserImageServiceAsync.java
public interface UserImageServiceAsync {
    public void getBlobstoreUploadUrl(AsyncCallback callback);

We’ll need to write the code on the server side:

// UserImageServiceImpl.java
public class UserImageServiceImpl extends RemoteServiceServlet implements UserImageService {

    public String getBlobstoreUploadUrl() {
        BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
        return blobstoreService.createUploadUrl("/upload");


This is pretty straightforward. We’ll want to invoke this service on the client side when we build the form. Let’s add this to UploadPhoto:

public class UploadPhoto extends Composite {

private static UploadPhotoUiBinder uiBinder = GWT.create(UploadPhotoUiBinder.class);
UserImageServiceAsync userImageService = GWT.create(UserImageService.class);

interface UploadPhotoUiBinder extends UiBinder {}

    Button uploadButton;

    FormPanel uploadForm;

    FileUpload uploadField;

    public UploadPhoto() {

        // Disable the button until we get the URL to POST to

        // Now we use out GWT-RPC service and get an URL

        // Once we've hit submit and it's complete, let's set the form to a new session.
        // We could also have probably done this on the onClick handler
        uploadForm.addSubmitCompleteHandler(new FormPanel.SubmitCompleteHandler() {

            public void onSubmitComplete(SubmitCompleteEvent event) {

    private void startNewBlobstoreSession() {
        userImageService.getBlobstoreUploadUrl(new AsyncCallback() {

            public void onSuccess(String result) {

            public void onFailure(Throwable caught) {
                // We probably want to do something here

    void onSubmit(ClickEvent e) {


This is fairly standard GWT RPC.

So that concludes the GWT part of it. We mentioned an upload callback. Let’s implement that now:

 * @author Ikai Lan
 *         This is the servlet that handles the callback after the blobstore
 *         upload has completed. After the blobstore handler completes, it POSTs
 *         to the callback URL, which must return a redirect. We redirect to the
 *         GET portion of this servlet which sends back a key. GWT needs this
 *         Key to make another request to get the image serving URL. This adds
 *         an extra request, but the reason we do this is so that GWT has a Key
 *         to work with to manage the Image object. Note the content-type. We
 *         *need* to set this to get this to work. On the GWT side, we'll take
 *         this and show the image that was uploaded.
public class UploadServlet extends HttpServlet {
	private static final Logger log = Logger.getLogger(UploadServlet.class

	private BlobstoreService blobstoreService = BlobstoreServiceFactory

	public void doPost(HttpServletRequest req, HttpServletResponse res)
			throws ServletException, IOException {

		Map blobs = blobstoreService.getUploadedBlobs(req);
		BlobKey blobKey = blobs.get("image");

		if (blobKey == null) {
			// Uh ... something went really wrong here
		} else {

			ImagesService imagesService = ImagesServiceFactory

			// Get the image serving URL
			String imageUrl = imagesService.getServingUrl(blobKey);

			// For the sake of clarity, we'll use low-level entities
			Entity uploadedImage = new Entity("UploadedImage");
			uploadedImage.setProperty("blobKey", blobKey);
			uploadedImage.setProperty(UploadedImage.CREATED_AT, new Date());

			// Highly unlikely we'll ever filter on this property

			DatastoreService datastore = DatastoreServiceFactory

			res.sendRedirect("/upload?imageUrl=" + imageUrl);

	protected void doGet(HttpServletRequest req, HttpServletResponse resp)
			throws ServletException, IOException {

		String imageUrl = req.getParameter("imageUrl");
		resp.setHeader("Content-Type", "text/html");

		// This is a bit hacky, but it'll work. We'll use this key in an Async
		// service to
		// fetch the image and image information


We’ll probably want to display the image we just uploaded in the client. Let’s add a line line of code to register a SubmitCompleteHandler to do this:

	public void onSubmitComplete(SubmitCompleteEvent event) {

		// This is what gets the result back - the content-type *must* be
		// text-html
		String imageUrl = event.getResults();
		Image image = new Image();

		final PopupPanel imagePopup = new PopupPanel(true);

		// Add some effects
		imagePopup.setAnimationEnabled(true); // animate opening the image
		imagePopup.setGlassEnabled(true); // darken everything under the image
		imagePopup.setAutoHideEnabled(true); // close image when the user clicks
												// outside it
		imagePopup.center(); // center the image


And we’re done!

Get the code

I’ve got the code for this project here:


Just a warning, this is a bit different from the sample code above. I wrote this post after I wrote the code, extrapolating the bare minimum to make this work. The sample code above has experimental tagging, delete and catches logins. I’m adding features to it simply to see what else can be done, so expect changes. I’m aware of a few of the bugs with the code, and I’ll get around to fixing them, but again, it’s a demo project, so keep realistic expectations. As far as I can tell, however, the code above should be runnable locally and deployable (once you have enabled billing for blobstore).

Happy developing!

Written by Ikai Lan

September 8, 2010 at 5:00 pm

Posted in App Engine, Java, Java