Ikai Lan says

I say things!

Archive for the ‘Software Development’ Category

Use the testify “require” library go Go assertions instead of the default one

Love assertions in your tests? You’re probably using this library:


One thing that confused me about this library is that assert.NotNil(), assert.NotErr() and other methods return a boolean instead of failing the test. For instance, this test does not fail:

func TestStringCannotCastToUint(t *testing.T) {
    var s interface{}
    s = "can't cast this to int"
    _, err := s.(uint16)
    assert.NotNil(t, err)

It turns out the default assert library doesn’t fail on these boolean asserts. To force it to fail, change the import to this:

import (
    assert "github.com/attic-labs/testify/require"	

You’ve now aliased require, which does fail the test, to assert, with no other code changes. Enjoy!

 P.S. Does the tag work for Go? I’ve tried "go" and "golang" with no luck.

Written by Ikai Lan

November 7, 2017 at 10:21 am

Is go test running slowly with your sqlite3 tests?

I have to look this up every time I clean up my environment, so I’ll write a quick post about it.

I love go test. It runs fast … except when it doesn’t. I noticed after adding a sqlite3 dependency for unit testing that it started to take 30 seconds to run.

Digging into it, I found this link to a discussion on golang-nuts. Basically: the root cause is that it was compiling the 5mb .c file each time.

The suggestion is to use go install on the sqlite3 dependency. If this works for you, great. It didn’t seem to work for me – possibly because I’m using vendoring and govend? I’m not sure. I didn’t spend too much time digging into it.

The magic command for me was this:

go test -i -v

That installed the dependencies, which made subsequent runs of go test much faster.

Written by Ikai Lan

November 2, 2017 at 2:57 pm

Recreating the create statement for a Redshift Spectrum table

(I don’t have time to write my usual long posts, so here’s a quick one to try to get me back into the habit of technical blogging)

All of the information to reconstruct the create statement for a Redshift Spectrum table is available via the views svv_external_tables and svv_external_columns views. Reconstructing the create statement is slightly annoying if you’re just using select statements. SO: Here is a quick and dirty Python script that does an okay but imperfect job of this:

# Copyright (c) 2017 Ikai Lan
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# To use this script:
# pip install psychopg2
from __future__ import print_function
import psycopg2
PG_PORT = 5439
SELECT columnname, external_type
FROM svv_external_columns
WHERE schemaname=(%s) AND tablename=(%s)
ORDER BY columnnum ASC"""
SELECT location
FROM svv_external_tables
WHERE schemaname=(%s) AND tablename=(%s)
def main():
conn = psycopg2.connect(dbname='dev',
table_name = TABLE_NAME
cur = conn.cursor()
cur.execute(TABLE_SCHEMA_QUERY, (SCHEMA_NAME, table_name,))
schema = cur.fetchall()
cur = conn.cursor()
cur.execute(TABLE_LOCATION_QUERY, (SCHEMA_NAME, table_name,))
location = cur.fetchone()
create_table_lines = []
'CREATE EXTERNAL TABLE {schemaname}.{tablename}('.format(
schemaname=SCHEMA_NAME, tablename=table_name))
column_lines = []
for column, column_type in schema:
'{column} {column_type}'.format(column=column, column_type=column_type))
columns = ',\n'.join(column_lines) + ')'
# TODO read the format from svv_external_tables.input_format and branch on that code
create_table_lines.append('STORED AS parquet')
create_table_lines.append("LOCATION '{location}'".format(location=location[0]))
create_table_statement = '\n'.join(create_table_lines) + ';'
if __name__ == '__main__':

Take and modify as needed for your needs. I even prepended an MIT license it for you overly cautious big company cats that may need to run it by legal or whatever.

Another cool approach if you’re using Parquet would be to use either parquet or fastparquet to read the schema from the parquet file in S3 and generate a create table statement based on that. I didn’t write that tool, but if you do, go ahead and let me know and I’ll link to you.

Written by Ikai Lan

August 16, 2017 at 10:51 am

Debugging your Google OAuth 2.0 token when you get HTTP 401s or 403s

with one comment

One of the things I get asked about the most is OAuth 2.0 when developers start seeing 401s, 403s, and possibly other HTTP 4xx status codes. This post isn’t meant to be a comprehensive guide on OAuth debugging for Google/YouTube APIs. Rather, it’s a collection of some of the steps I find myself recommending or repeating when I’m trying to debug issues with OAuth authorization.

Enabling the APIs

I’ve recorded a short video describing how to enable Google API access for use with web and installed apps:

The checklist is:

  1. Did you create a new API project?
  2. Did you enable the APIs you are looking to use?
  3. Did you create a client ID and client secret?

A common step for developers is that they forget step #2 – enabling the APIs.

Getting a token

Here are my cliff notes on tokens:

  • Access tokens – used to make API calls. These expire after an hour
  • Refresh tokens – you only get these if you request offline access when you ask a user to authorize. These are exchanged by your client for access tokens. Refresh tokens generally don’t expire.

What causes 401s and other 4xx status codes?

The common causes for “401 Unauthorized” when making API calls with an access token are:

  • expired access token (most common)
  • Developer accidentally disabled the APIs (uncommon)
  • User revokes token (rare)

Sometimes, more explanation exists in the response body of a HTTP 4xx. In the Java client, for example, you should log the error, because it will assist in troubleshooting:

try {   
       // Make your Google API call
} catch (GoogleJsonResponseException e) {
      GoogleJsonError error = e.getDetails();
      // Print out the message and errors

Different versions will have different API signatures. Here’s a link to the current version’s Javadocs (1.1.15) – newer versions look like they might be deviating from this a bit.

Troubleshooting the token

Keep this tool in your toolbox: the tokenInfo API call. Take your access token and plug it in to the end of this URL:

try {
// Make your Google API call
} catch (GoogleJsonResponseException e) {
GoogleJsonError error = e.getDetails();
// Print out the message and errors

You could take your existing code and make an API call here whenever you get a HTTP 4xx and log that response. This’ll return some useful information:

  • When the token expires
  • What’s the token’s scope (this is important)
  • If the token is invalid

If you didn’t write the code yourself and inherited it:

  • Whether this access token came from an offline refresh_token or not (the “offline” field)

If the token is invalid … well, that doesn’t help a lot. I would troubleshoot like this:

  1. Remove the access token from your datastore or database.
  2. Use the refresh token to acquire a new access token (if you are using a refresh token)
  3. Try to make the API call again. If it works, you’re good! If not …
  4. Check the access token against the tokenInfo API
  5. If it’s still invalid, do a full reauth

Hope this helps!

Written by Ikai Lan

July 19, 2013 at 12:02 pm

Posted in Software Development

Tagged with ,

Why “The Real Reason Silicon Valley Coders Write Bad Software” is wrong

with 12 comments

There was an article in The Atlantic this morning titled, “The Real Reason Silicon Valley Coders Write Bad Software” with the tagline, “If someone had taught all those engineers how to string together a proper sentence, Windows Vista would be a lot less buggy.” The author, Bernard Meisler, seems to think that the cause of “bad software” is, and I quote:

“But the downfall of many programmers, not just self-taught ones, is their lack of ability to sustain complex thought and their inability to communicate such thoughts. That leads to suboptimal code, foisting upon the world mediocre (at best) software like Windows Vista, Adobe Flash, or Microsoft Word.”

Not only are the conclusions of the article inaccurate, it paints a negative portrayal of software engineers that isn’t grounded in reality. For starters, there is a distinction between “bad” software and “buggy” software. Software that is bad tends to be a result of poor usability design. Buggy software, on the other hand, is a consequence of a variety of factors stemming from the complexity of modern software. The largest factor in reducing the number of bugs isn’t going to come from improving the skills of individual programmers, but rather, from instituting quality control processes throughout the software engineering lifecycle.

Bad Software

Bad software is software that, for whatever reason, does not meet the expectation of users. Software is supposed to make our lives simpler by crunching data faster, or automating repetitive tasks. Great software is beautiful, simple, and, given some input from a user, produces correct output using a reasonable amount of resources. When we say software is bad, we mean any combination of things: it’s not easy to use. It gives us the wrong output. It uses resources poorly. It doesn’t always run. It doesn’t do the right thing. Bugs contribute to a poor user experience, but are not the sole culprit for the negative experiences that users have with software. Let’s take one of the examples Meisler has cited: Windows Vista. A quick search for “why does windows vista suck” in Google turns up these pages:


Oh, bugs are on there, and they’re pretty serious. We’ll get to that. But what else makes Windows Vista suck, according to those sites? Overly aggressive security prompts. Overly complex menus (we’ll visit the idea of complexity again later, I promise). None of the menus make sense. Changed network interface. Widgets that are too small to be usable shipping with the system. Rearranged menus. Search that only works on 3 folders. Those last few things aren’t bugs, they’re usability problems. The software underneath is, for the most part, what we software engineers call working as intended. Some specification somewhere designed those features to work that way, and the job of the software engineers is, in many companies, to build the software to that specification, as ridiculous as the specification is. One of my coworkers points out that Alan Cooper, creator of Visual Basic, wrote a great book about this subject titled, “The Inmates are Running The Asylum”. Interestingly enough, his argument is that overly technical people are harmful when they design user interactions, and this results in panels like the Windows Vista search box with fifty different options. But, to be fair, even when user interactions are designed well, just making the software do what the user expects is hard. A simple button might be hiding lots of complex interactions underneath the hood to make the software easy to use. The hilarious and often insightful Steve Yegge, talks about just that in a tangentially related post about complexity in software requirements. Software is generally called “bad” when it does not do what the user expects, and this is something that is really hard to get right.

Buggy Software

Buggy software, on the other hand, is software that does not behave as the software engineer expects or the specification dictates. This is in stark contrast to bad software, which is software that does not behave as the way a user expects. There’s often overlap. A trivial example: let’s suppose an engineer writes a tip calculator for mobile phones that allows a user to enter a dollar amount, and press a “calculate” button, which then causes the application to output 15% of the original bill amount on the screen. Let’s say a user uses the application, enters $100, and presses calculate. The amount that comes out is $1500. That’s not 15%! Or – the user presses calculate, and the application crashes. The user expects $15, but gets $1500. Or no result, because the application ends before it presents output.

Software is buggy partially because of bad documentation, as Meisler asserts, but not primarily because of it. Software isn’t even buggy because programmers can’t express “complex thoughts”, another of Meisler’s gems; all of programming is the ability to “combine simple ideas into compound ideas”. Software is buggy because of problems stemming out of complexity.

All software is built on top of abstractions. That is, someone else is responsible for abstracting away the details such that a programmer does not need to fully understand another system to be able to use it. As a programmer, I do not need to understand how my operating system communicates with the hard drive to save a file, or how my hard disk manages its read/write heads. I don’t need to write code that says, “move the write head” – I write code that says, “write this data to a file to this directory”.  Or maybe I just say, “just save this data” and never worry about files or directories. Abstractions in software are kind of like the organizational structure of a large company. The CEO of a large car manufacturer talks to the executive board about the general direction of the company. The executive staff then break these tasks down into more specific focus area goals for their directors, who then break these tasks into divisional goals for the managers, who then break these tasks into team goals and tasks for the individual employees that actually build, design, test, market, and sell the damn things. To make this more convoluted, it’s not all from top to bottom communication, either. There are plenty of cross team interactions, and interactions between layers of management that cross the chain of command.

To say that poor documentation is the primary source of bugs is laughable. Documentation exists to try to make sense of the complexity, but there is no way documentation can be comprehensible in any reasonably complex software with layers of abstraction, because, as Joel Spolsky, founder of Fog Creek Software says, abstractions leak. Programmers cannot know all the different ways abstractions they are depending on will fail, and thus, they cannot possibly report or handle all the different ways the abstraction they are working on will fail. More importantly: programmers cannot know how every possible combination of abstractions they are depending on will produce subtly incorrect results that result in more and more warped results up the abstraction stack. It’s like the butterfly effect.  By the time a bug surfaces, a programmer needs to chase it all the way down the rabbit hole, often into code he does not understand.  Documentation helps, but no programmer reads documentation all the way down to the bottom of the stack before he writes code. It’s not commercially feasible for programmers to do this and retain all the information a priori. Non-trivial software is complex as hell underneath the hood, and it doesn’t help that even seemingly simple software often has to turn water into wine just to try to do what a user expects.

Software engineers and critical thinking

I don’t deny the importance of writing or critical thinking skills. They are crucial. I wouldn’t be surprised if the same ability to reason through complex thoughts allows people to write well as well as program well. But to assert that writing skills lead to reasoning skills? This is a case of placing the cart before the horse. Meisler is dismissive of the intellectual dexterity needed to write programs:

“Most programmers are self-taught and meet the minimum requirement for writing code — the ability to count to eight”

It’s not true. Programming often involves visualizing very abstract data structures, multivariate inputs/outputs, dealing with non-deterministic behavior, and simulating the concurrent interactions between several moving parts in your mind. When I am programming, holding in my mental buffer the state of several objects and context switching several times a second to try to understand how a small change I make in one place will ripple outwards. I do this hundreds of times a session. It’s a near trance-like state that takes me some time to get into before I am working at full speed, and why programming is so damned hard. It’s why I can’t be interrupted and need a contiguous block of time to be fully effective on what Paul Graham calls the maker’s schedule. I’m not the only one who feels this way – many other programmers report experiencing the mental state that psychologists refer to as “flow” when they are performing at their best. 

How to reduce the incidences of bugs

Phillipe Beaudoin, a coworker, writes:

I like to express the inherent complexity of deep software stacks with an analogy, saying that software today is more like biology than mathematics. Debugging a piece of software is more like an episode of House than a clip from A Beautiful Mind. Building great software is about having both good bug prevention processes (code reviews, tests, documentation, etc.) as well as good bug correction processes (monitoring practices, debugging tools).

Trying to find a single underlying cause to buggy software is as preposterous as saying there is a single medical practice that would solve all of earth’s health problems.

Well said.

I’m disappointed in the linkbait title, oversimplification, and broad sweeping generalizations of Bernard Meisler’s article. I’m disappointed that this is how software engineering is being represented to a mainstream, non-techie audience. It’s ironic that the article totes writing skills, but is poorly structured in arguing a point. It seems to conclude that writing skills are the reason code is buggy. No wait – critical thinking. Ah! Nope, surprise, writing skills, and a Steve Jobs quote that is used in a misleading way and taken out of context mixed in for good measure. He argues for the logic of language, but as many of us who also write for fun and profit know, human language is fraught with ambiguity and there’s a lot less similarity between prose and computer programming languages than Meisler would have the mainstream audience believe. I’m sorry, Herr Meisler, but if your article were a computer program, it simply wouldn’t compile.

— Ikai

Written with special thanks to Philippe Beaudoin, Marvin Gouw, Alejandro Crosa, and Tansy Woan.

Written by Ikai Lan

October 9, 2012 at 11:14 pm

Clearing up some things about LinkedIn mobile’s move from Rails to node.js

with 13 comments

There’s an article on highscalability that’s talking about the move from Rails to node.js (for completeness: its sister discussion on Hacker News). It’s not the first time this information has been posted. I’ve kind of ignored it for now (because I didn’t want to be this guy), but it’s come up enough times and no one has spoken up, so I suppose it’s up to me to clear a few things up.

I was on the team at LinkedIn that was responsible for the mobile server, and while I wasn’t the primary contributor to that stack, I built and contributed several things, such as the unfortunate LinkedIn WebOS app which made use of the mobile server (and a few features) and much of the initial research behind productionizing JRuby for web applications (I did much more stuff that wasn’t published). I left LinkedIn in 2009, so I apologize if any new information has surfaced. My hunch is that even if I’m off, I’m not off by that much.

Basically: the article is leaving out several facts. We can all learn something from the mobile server and software engineering if we know the full story behind the whole thing.

In 2008, I joined a software engineering team that LinkedIn that was focused on building things outside the standard Java stack. You see, back then, to develop code for linkedin.com, you needed a Mac Pro with 6gigs of RAM just to run your code. And those requirements kept growing. If my calculations are correct, the standard setup for engineers now is a machine with 20 or more gigabytes of RAM just to RUN the software. In addition, each team could only release once every 6 weeks (this has been fixed in the last few years). It was deemed that we needed to build out a platform off the then-fledgling API and start creating teams to get get off the 6 week release cycle so we could iterate quickly on new features. The team I was on, LED, was created for this purpose.

Our first projects was a rotating globe that showed off new members joining LinkedIn. It used to run Poly9, but when they got shut down, it looks like someone migrated it to use Google Earth. The second major project was m.linkedin.com, a mobile web client for LinkedIn that would be one of the major clients of our fledgling API server, codenamed PAL. Given that we were building out an API for third parties, we figured that we could eat our own dogfood and build out LinkedIn for mobile phones with browsers. This is 2008, mind you. The iPhone just came out, and it was a very Blackberry world.

The stack we chose was Ruby on Rails 1.2, and the deployment technology was Mongrel. Remember, this is 2008. Mongrel was cutting edge Ruby technology. Phusion Passenger wasn’t released yet (more on this later), and Mongrel was light-years ahead of FastCGI. The problem with Mongrel? It’s single-threaded. It was deemed that the cost of shipping fast was more important than CPU efficiency, a choice I agreed with. We were one of the first products at LinkedIn to do i18n (well, we only did translations) via gettext. We deployed using Capistrano, and were the first ones to use nginx. We did a lot of other cool stuff, like experiment with Redis, learn a lot about memcached in production (nowadays this is a given, but there was a lot of memcached vs EHCache talk back then). Etc, etc. But I’m not trying to talk about how much fun I had on that team. Well, not primarily.

I’m here to clear up facts about the post about moving to node.js. And to do that, I’m going to back to my story.

The iPhone SDK had shipped around that time. We didn’t have an app ready for launch, but we wanted to build one, so our team did, and we inadvertantly became the mobile team. So suddenly, we decided that this array of Rails server that made API calls to PAL (which was, back then, using a pre-OAuth token exchange technology that was strikingly similar) would also be the primary API server for the iPhone client and any other rich mobile client we’d end up building, this thing that was basically using Rails rhtml templates. We upgraded to Rails 2.x+ so we could have the respond_to directive for different outputs. Why didn’t we connect the iPhone client directly to PAL? I don’t remember. Oh, and we also decided to use OAuth for authenticating the iPhone client. Three legged OAuth, so we also turned those Rails servers into OAuth providers. Why did we use 3-legged OAuth? Simple: we had no idea what we were doing. I’LL ADMIT IT.

Did I mention that we hosted outside the main data centers? This is what Joyent talks about when they say they supplied LinkedIn with hosting. They never hosted linkedin.com proper on Joyent, but we had a long provisioning process for getting servers in the primary data center, and there were these insane rules about no scripting languages in production, so we decided it was easier to adopt an outside provider when we needed more capacity.

Here’s what you were accessing if you were using the LinkedIn iPhone client:

iPhone -> m.linkedin.com (running on Rails) -> LinkedIn’s API (which, for all intents and purposes, only had one client, us)

That’s a cross data center request, guys. Running on single-threaded Rails servers (every request blocked the entire process), running Mongrel, leaking memory like a sieve (this was mostly the fault of gettext). The Rails server did some stuff, like translations, and transformation of XML to JSON, and we tested out some new mobile-only features on it, but beyond that it didn’t do a lot. It was a little more than a proxy. A proxy with a maximum concurrency factor dependent on how many single-threaded Mongrel servers we were running. The Mongrel(s), we affectionately referred to them, often bloated up to 300mb of RAM each, so we couldn’t run many of them.

At this time, I was busy productionizing JRuby. JRuby, you see was taking full advantage of Rails’ ability to serve concurrent requests using JVM concurrency. In addition, JRuby outperformed MRI in almost every real benchmark I threw at it – there were maybe 1 or 2 specific benchmarks when it didn’t. I knew that if we ported the mobile server to JRuby, we could have gotten more performance and gotten way more concurrency. We would have kept the same ability to deploy fast with the option to in-line into many of the Java libraries LinkedIn was using.

But we didn’t. Instead, the engineering manager at the time ruled in favor of Phusion Passenger, which, to be fair, was an easier port than JRuby. We had come to depend on various native extensions, gettext being the key one, and we didn’t have time to port the translations to something that was JRuby friendly. I was furious, of course, because I had been evangelizing JRuby as the best Ruby production environment and no one was listening, but that’s a different story for a different time. Well, maybe some people listened; those Square guys come to mind.

This was about the time I left LinkedIn. As far as I know, they didn’t build a ton more features. Someone told me that one of my old teammates suddenly became fascinated with node.js, and pretty much singlehandedly decided to rewrite the mobile server using node. Node was definitely a better fit for what we were doing, since we were constantly blocking on a cross data center call, and non blocking server for IO has been shown to be highly advantageous from a performance perspective. Not to mention: we never intended for the original Ruby on Rails server to be used as a proxy for several years.

So, knowing all the facts, what are all the takeaways?

  • Is v8 faster than MRI? MRI is generally slower than YARV (Ruby 1.9), and, at least in these benchmarks, I don’t think there is any question that v8 is freakin’ fast. If node.js blocked on I/O, however, this fact would have been almost completely irrelevant.
  • The rewrite factor. How many of us have been on a software engineering project where the end result looking nothing like what we planned to build in the first place? And, knowing fully the requirements, we know that, if given time and the opportunity to rebuild it from scratch, it would have been way better? Not to mention: I grew a lot at LinkedIn as a software engineer, so the same me several years later would have done a far better job than the same me in 2008. Experience does matter.
  • I see that one of the advantages of the mobile server being in node.js is people could “leverage” (LinkedIn loves that word) their Javascript skills. Well, LinkedIn had/has hundreds of Java engineers! If that was a concern, we would have spent more time exploring Netty. Lies, damn lies, and benchmarks, I always say, but I think it’s safe for us to say that Netty (this is vertx, which sits on top of Netty) is at least as fast as node.js for web serving.
  • Firefighting? That was probably a combination of several things: the fact that we were running MRI and leaked memory, or the fact that the ops team was 30% of a single guy.

What I’m saying here is use your brain. Don’t read the High Scalability post and assume that you must build your next technology using node.js. It was definitely a better fit than Ruby on Rails for what the mobile server ended up doing, but it is not a performance panacea. You’re comparing a lower level server to a full stack web framework.

That’s all for tonight, folks, and thank you internet for finally goading me out of hiding again.

– Ikai

Written by Ikai Lan

October 4, 2012 at 6:34 pm

Apps Script quick tips: building a stock price spreadsheet

with 4 comments

I’ve been using iGoogle less and less over the past few years. A few weeks ago, the team announced that iGoogle would be shutting down in November 2013. It’s not a huge loss to me, though I do check iGoogle several times a day. Why? Stock prices! I’ve been using the Stock Market gadget for years.

As it turns out, the functionality I want is very easy to replicate using Google Spreadsheets and Google Apps Script. I’m thoroughly convinced that the fastest way to wire up different Google services for custom functionality is this product. Google Apps Script provides services to access Google Finance APIs.

Knowing this, it’s incredibly easy to wire up a spreadsheet that has access to live finance data. The spreadsheet I use looks something like this:


We can pull this off in a few very easy steps.

Step 1: Create a spreadsheet.

I made a spreadsheet with the following columns names:

Symbol Price Change Change % Details


My intended use is to populate the Symbol column and have the rest of the data in the other columns auto populated. The nice thing about writing scripts that integrate with spreadsheets is that we have a built in UI for making edits, sorting, filtering and searching. By using spreadsheets as our data entry and manipulation UI, our functionality is already more advanced than the functionality provided in the Stock Market gadget as well as many other online portfolio-at-a-glance services.

Step 2: Create the script

What we’re going to do is write a few functions in the Script Editor. Spreadsheet cells can accept both the standard set of built-in functions that do simple things like SUM, AVG, and so forth, but they can also accept custom functions that retrieve data from other Google services.

Click Tools -> Script Editor.


This will open up a new tab in your browser where you can write code. The default name of this file is Code.gs. Replace whatever is in the buffer with this:

function getStockPrice(symbol) {
  return FinanceApp.getStockInfo(symbol)["price"];
function getStockPriceChangePct(symbol) {
  return FinanceApp.getStockInfo(symbol)["changepct"];

function getStockPriceChange(symbol) {
  return FinanceApp.getStockInfo(symbol)["change"];

function getGoogleFinanceLink(symbol) {
  return "http://www.google.com/finance?q=" + symbol;

Your Script Editor should look like this:


FinanceApp.getStockInfo() returns a FinanceResult instance with a LOT of data. I only care about the basics: price, price change, and price change percentage. The functions I’ve defined reflect this.

Step 3: Add the functions into the cells

Now let’s go back to the spreadsheet tab. I’ve populated a few basic symbols under the Symbol column: GOOG (Google) and AAPL (Apple), two of my favorite companies. In column B2, enter this value:


Hit enter. If everything is working correctly, this will now populate with the latest price of whatever stock symbol is in A2. Let’s add the rest of the functions. In C2, enter:




I like to have a link back to Google Finance if I ever want to do more research on a company, so in E2, add:


This next part is hard to explain but shouldn’t be difficult for anyone who has used a spreadsheet program before. Highlight rows B2-E2. You can hold down shift and select these rows. Now hover your mouse over the bottom right corner of E2 and drag down a few rows. What this does is it copies the functions for the subsequent rows, but it substitutes A2 for A3, A4, A5, … depending on what row you happen to be in. You can test this out by adding additional stock symbols. The live stock data will appear.

Step 4: Color Coding

I like to see color coding depending on whether a stock price has risen or fallen. Hold down shift and click on C, then D at the top of the rows:


Click the arrow to the right. This should drop down a menu. Click on “Conditional Formatting”:


You’ll want to add two rules: a greater than rule and a less than rule. When the Change and Change % columns are greater than 0, change the background to green. When they are less than 0, change the backgrounds to red. Click “Save Rules”

You’re done!


I’ve only scratched the surface of what can be done with Apps Script. We haven’t even gotten into a lot of the other cool things we can do. Using Clock Events, we can check every few minutes for changes and email ourselves using the GmailApp library if a stock price change is greater than some threshhold. We can generate charts based on historic data. And so on, and so forth. For more examples of things that can be done with Google Apps Script, check out the tutorials section for more ideas.

Have a great weekend!

– Ikai


Written by Ikai Lan

July 27, 2012 at 2:05 pm

Getting started with jOOQ: A Tutorial

with 10 comments


I accidentally stumbled onto jOOQ a few days ago while doing a lot of research on Hibernate. Funny how things work, isn’t it? For those of you that aren’t familiar with it, jOOQ is a different approach to the over-ORMing of Java persistence. Rather than try to map database tables to Java classes and abstract away the SQL underneath, jOOQ assumes you want low level control over the SQL queries you execute, and provides a mostly typesafe interface for executing queries. I don’t have anything against simple ORMs, but it’s good to have the right tool for the right job. From the jOOQ homepage:</p?

Instead of this SQL query:


You would execute this Java code:


Why a Java interface? Type safety, for one. Programmatically using jOOQ’s DSL has some advantages over writing SQL queries by hand, such as IDE support and compile time checking of some things.

The idea interested me and I dug in. Unfortunately, the jOOQ site’s documentation, while fairly comprehensive, DO NOT PROVIDE AN END TO END “GETTING STARTED” PAGE!!! This means that if you want to learn jOOQ, you’ll have to jump to the chapter about Meta model code generation, then jump to the DSL, then jump to jOOQ classes section. It’s a bit of a mess for new users. Google search also didn’t turn up many useful results, so I figured I’d whip up a quick “Getting started” guide. We’re going to go over the following steps:

Preparation: Download jOOQ and your SQL driver
Step 1: Create a SQL database and a table
Step 2: Generate classes
Step 3. Write a main class and establish MySQL connection
Step 4: Write a query using jOOQ’s DSL
Step 5: Iterate over results
Step 6: Profit!

Ready? Let’s get started.

Getting our hands dirty

Preparation: Download jOOQ and your SQL driver

If you haven’t already downloaded them, download jOOQ:


For this example, we’ll be using MySQL. If you haven’t already downloaded MySQL Connector/J, download it here:


Stash these somewhere where you can get to them later.

Step 1: Create a SQL database and a table

We’re going to create a database called “guestbook” and a corresponding “posts” table. Connect to MySQL via your command line client and type the following:

create database guestbook;

CREATE TABLE `posts` (
  `id` bigint(20) NOT NULL,
  `body` varchar(255) DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  `title` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)

(I copied and pasted the create table statement from a “show create table” command)

Step 2: Generate classes

In this step, we’re going to use jOOQ’s command line tools to generate classes that map to the Posts table we just created. The official docs are here.

I’m going to augment the command line steps a bit. The easiest way to generate a schema is to copy the jOOQ jar files (there should be 3) and the MySQL Connector jar file to a temporary directory. Create a properties file. I’ve created a file called guestbook.properties that looks like this:

#Configure the database connection here

#The default code generator. You can override this one, to generate your own code style
#Defaults to org.jooq.util.DefaultGenerator

#The database type. The format here is:

#All elements that are generated from your schema (several Java regular expressions, separated by comma)
#Watch out for case-sensitivity. Depending on your database, this might be important!

#All elements that are excluded from your schema (several Java regular expressions, separated by comma). Excludes match before includes

#Primary key / foreign key relations should be generated and used. 
#This will be a prerequisite for various advanced features
#Defaults to false

#Generate deprecated code for backwards compatibility 
#Defaults to true

#The destination package of your generated classes (within the destination directory)

#The destination directory of your generated classes

One thing that wasn’t clear from jOOQ’s docs is the value of jdbc.Schema: it should be your database name. Since our database name is “guestbook”, that’s what we put. Replace the username with whatever user has the appropriate privileges: in my local dev database, my user has what is effectively root access to everything without a password. You’ll want to look at the other values and replace as necessary. Here are the two interesting properties:

generator.target.package – set this to the parent package you want to create for the generated classes. My setting of test.generated will cause the test.generated.Posts and test.generated.PostsRecord to be created

generator.target.directory – the directory to output to. Worst case scenario you can just copy the files to the package.

Once you have the JAR files and guestbook.properties in your temp directory, type this:

java -classpath jooq-1.6.8.jar:jooq-meta-1.6.8.jar:jooq-codegen-1.6.8.jar:mysql-connector-java-5.1.18-bin.jar:. org.jooq.util.GenerationTool /jooq.properties

Note the prefix slash before jooq.properies. Even though it’s in our working directory, we need to prepend a slash.

Replace the filenames with your filenames. In this example, I’m using jOOQ 1.6.8. If everything has worked, you should see this in your console output:

Nov 1, 2011 7:25:06 PM org.jooq.impl.JooqLogger info
INFO: Initialising properties  : /jooq.properties
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Database parameters      
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ----------------------------------------------------------
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   dialect                : MYSQL
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   schema                 : guestbook
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   target dir             : /Users/ikai/Documents/workspace/MySQLTest/src
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO:   target package         : test.generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ----------------------------------------------------------
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Emptying                 : /Users/ikai/workspace/MySQLTest/src/test/generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating classes in    : /Users/ikai/workspace/MySQLTest/src/test/generated
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating schema        : Guestbook.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating factory       : GuestbookFactory.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Schema generated         : Total: 122.18ms
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Sequences fetched        : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Masterdata tables fetched: 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Tables fetched           : 5 (5 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating tables        : /Users/ikai/workspace/MySQLTest/src/test/generated/tables
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: ARRAYs fetched           : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Enums fetched            : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: UDTs fetched             : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating table         : Posts.java
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Tables generated         : Total: 680.464ms, +558.284ms
Nov 1, 2011 7:25:07 PM org.jooq.impl.JooqLogger info
INFO: Generating Keys          : /Users/ikai/workspace/MySQLTest/src/test/generated/tables
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Keys generated           : Total: 718.621ms, +38.157ms
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Generating records       : /Users/ikai/workspace/MySQLTest/src/test/generated/tables/records
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Generating record        : PostsRecord.java
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Table records generated  : Total: 782.545ms, +63.924ms
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Routines fetched         : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: Packages fetched         : 0 (0 included, 0 excluded)
Nov 1, 2011 7:25:08 PM org.jooq.impl.JooqLogger info
INFO: GENERATION FINISHED!     : Total: 791.688ms, +9.143ms

Step 3. Write a main class and establish MySQL connection

Let’s just write a vanilla main class in the project containing the generated classes:

public class Main {

	public static void main(String[] args) {
		Connection conn = null;
		String userName = "ikai";
		String password = "";
		String url = "jdbc:mysql://localhost:3306/guestbook";
		try {
			conn = DriverManager.getConnection(url, userName, password);
		} catch (Exception e) {
			// You'll probably want to handle the exceptions in a real app
			// Don't ever do this silence catch(Exception e) thing. I've seen this in
			// live code and it is horrendous.


This is pretty standard code for establishing a MySQL connection.

Step 4: Write a query using jOOQ’s DSL

Let’s add a simple query:

			GuestbookFactory create = new GuestbookFactory(conn);
			Result result = create.select().from(Posts.POSTS).fetch();

We need to first get an instance of GuestbookFactory so we can write a simple SELECT query. We pass an instance of the MySQL connection to GuestbookFactory. Note that factory doesn’t close the connection. We’ll have to do that ourselves.

We then use jOOQ’s DSL to return an instance of Result. We’ll be using this result in the next step.

Step 5: Iterate over results

After the line where we retrieve the results, let’s iterate over the results and print out the data:

			for (Record r : result) {
				Long id = r.getValueAsLong(Posts.ID);
				String title = r.getValueAsString(Posts.TITLE);
				String description = r.getValueAsString(Posts.BODY);
				System.out.println("ID: " + id + " title: " + title + " desciption: " + description);

The full program should now look like this:

package test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

import org.jooq.Record;
import org.jooq.Result;

import test.generated.GuestbookFactory;
import test.generated.tables.Posts;

public class Main {

	 * @param args
	public static void main(String[] args) {
		Connection conn = null;
		String userName = "ikai";
		String password = "";
		String url = "jdbc:mysql://localhost:3306/guestbook";
		try {
			conn = DriverManager.getConnection(url, userName, password);

			GuestbookFactory create = new GuestbookFactory(conn);
			Result result = create.select().from(Posts.POSTS).fetch();
			for (Record r : result) {
				Long id = r.getValueAsLong(Posts.ID);
				String title = r.getValueAsString(Posts.TITLE);
				String description = r.getValueAsString(Posts.BODY);
				System.out.println("ID: " + id + " title: " + title + " desciption: " + description);
		} catch (Exception e) {
			// You'll probably want to handle the exceptions in a real app
			// Don't ever do this silence catch(Exception e) thing. I've seen this in 
			// live code and it is horrendous.

Step 6: Profit!

Get a job and go to work like the rest of us.


I haven’t explored the more advanced bits of jOOQ, but, at least judging from the docs, it looks like there’s a lot of meat there. I’m hoping this guide makes it easier for new users to dive in.

– ikai
Currently listening: Sweat – Snoop Dogg vs David Guetta

Written by Ikai Lan

November 1, 2011 at 6:54 pm

On Hackathons, Process, Email and the Tragedy of the Commons

with 3 comments


I love hackathons. I love going to them, and I love running them. Most recently, I participated in a 48-hour hackathon in Kuala Lumpur, Malaysia. It’s one of the best parts of my job. I get to run (and sometimes participate) in both external hackathons, as well as hackathons that are internal to Google.

In early June I held an internal Hackathon at Google to teach employees how to best use the product I work on: Google App Engine. I consider the event a success: we had hundreds of RSVPs and a completely booked room. It was so successful, in fact, that I’m planning on holding at least one of these events a quarter. The breakdown was primarily newer employees, which didn’t surprise me giving the amount of hiring we’ve been doing.

A primary driver for the sheer volume of RSVPs was the fact that we advertised the event on a mailing list that went out to pretty much all of engineering. All. Of. It. An engineering company with headcount in the tens of thousands, hundreds of RSVPs was not only likely, it was pretty much a mathematical certainty. Looking back, we would probably not have received the response if we didn’t sent out such a wide blast.

As a result of what I consider to be a fairly successful event (and I don’t mean to take all the credit here, at about the same time as my event, there was another very successful internal hackathon), various teams have suggested hackathons for their product APIs. There are events on the calendar.

Therein, of course, lies our problem. The problem of noise.

What should we do? Email all of engineering for every event? Create a new list/site/page announcing new events? Let’s break down the tradeoffs for each choice:

1. email all of engineering

Pros: goes to everyone

Cons (and this is the bigger point of this post): the majority of events will be irrelevant, causing the signal-to-noise ratio on the list to significantly drop, causing people to filter out these annoucements

2. Create a new distribution channel for events

Pros: Opt in

Cons: You don’t get the distribution you’d get with #1, since only a minority of people will opt in. Also – has the same SNR problems.

Now, a hybrid solution would be to do both. High profile, important events go to all of engineering, and smaller events go to the special distribution channel. The issue here is that everyone’s event is high profile. So again, we don’t have a great solution. Not to mention: people can only attend so many hackathons and still be able to do all the stuff they’re supposed to be doing. See, that’s one of the great things about Google engineering. If you’re consistently delivering, there isn’t a manager in the company that will tell you not to attend a hackathon or internal event where you can only get better at what you do. The issue, of course, is that the more hackathons take place, you are likely taking something a resource away from another team for a non-trivial amount of time. From a hackathon organizer’s perspective, a hackathon is almost always beneficial as long as some non-zero number of participants show up: they learn about your API, provide feedback and you learn a bit about how to improve the documentation or SDK. You almost can’t afford not to throw a hackathon.

This is the classic example of the tragedy of the commons. By running an event, you consume space. You consume employee time. You generate noise on all the distribution channels. And when everyone does it, suddenly, as a whole, everyone is worse off, though you yourself may individually gain.


Another key example of the tragedy of the commons is a company’s email marketing. I worked at a consumer internet company that broke teams by product. To drive usage metrics for an individual product, the product managers would run email campaigns to the site’s millions of users. The result was that the individual product would receive for usage, and everyone would give themselves a pat on the back. What was actually happening was that it was causing users to become extremely irritated at the company (myself included) for the voluminous amounts of email being sent all the time. Sure, you could go to the site settings and disable email, but new products would automatically opt you in to receiving notifications, and you would have to log back into the site to find the settings and disable those notifications as well. Some users, like myself, have created Gmail filters to completely send all emails from this company’s domain to a “Stupid Mail” label. I can understand the individual product managers’ reasoning. You don’t want to be the one team that doesn’t deliver metrics, so you email spam. And when everyone email spams, it’s to the detriment of the company overall. An employee posted to an internal group asking if it was an example of the tragedy of the commons – I don’t know if his advice was ever heeded, but based on the complaints I see on Twitter about email, my guess is no.


I view team processes the same way, and this sometimes leads to some very heated discussions with people I work with. It’s not that I don’t believe making your 1 step process a 5 step process doesn’t make your life easier or the company better organized; it’s that everybody wants to turn their processes from one step, lightweight, free form processes into full on, form driven, strict-requirements-based, signed-in-triplicate steps for doing things. I fight heavy processes when I can because I don’t believe enough people do so. Why? The tragedy of the commons. An extra 20 minutes here, and extra 20 minutes there, and suddenly, I am spending most of my day tangled in process instead of getting things done.

There are no easy solutions to this, of course. Some process is necessary, though from the onset, it isn’t always obvious which ones. How do you know, for instance, if a process is unnecessary? A good example is a managerial approval step in a process. Let’s say I need approval to do something. How do I evaluate if managerial approval is working?

  • What is the cost of doing it wrong? What was the bad outcome?
  • What was the number of incidences in which, prior to the institution of the process, that approval would have prevented a bad outcome?
  • Is the manager rubber stamping requests?

What absolutely needs to be done are constant evaluations of process. Don’t create a process and sit on it. Make it better. What can you take away, and still have it work? Think about your last trip to the DMV. How many steps could have been eliminated?

Awareness of the bigger picture

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

Antoine de Saint-Exupery
French writer (1900 – 1944)

I suppose that’s the solution to fighting the tragedy of the commons. A constant awareness of the bigger picture and a real desire to make things better. An understanding that many things in this world are a zero sum game. I’ll issue caution, of course: you can probably only champion a few things. Championing fixing everything, and people stop listening to you, you lose focus, and you will end up fixing very little. What do we call this effect? No, I won’t bother. Hopefully you actually read this and already know.

– Ikai

Written by Ikai Lan

July 16, 2011 at 2:24 pm

Setting up an OAuth provider on Google App Engine

with 25 comments

App Engine provides an API for easily creating an OAuth provider. In this blog post, I’ll describe the following steps:

  1. Create and deploy an App Engine application the implements the OAuth API
  2. Add a new domain to your Google Account. Verify this domain.
  3. Connecting an OAuth client to make requests against your application

I’ll avoid a deep explanation of OAuth for now. We can find everything you need to know about OAuth in the Beginner’s guide to OAuth.

Get the code

The code that goes along with this blog post is available here:


The two most important files are:

  • python/oauth_client.py
  • src/com/ikai/oauthprovider/ProtectedServlet.java

Step 1: Create and deploy an App Engine application that uses the OAuth API

Create a new App Engine Java application. I’ve created a servlet called ProtectedServlet:

package com.ikai.oauthprovider;

import com.google.appengine.api.oauth.OAuthRequestException;
import com.google.appengine.api.oauth.OAuthService;
import com.google.appengine.api.oauth.OAuthServiceFactory;
import com.google.appengine.api.users.User;

import java.io.IOException;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class ProtectedServlet extends HttpServlet {
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
	    throws IOException {
	User user = null;
	try {
	    OAuthService oauth = OAuthServiceFactory.getOAuthService();
	    user = oauth.getCurrentUser();
	    resp.getWriter().println("Authenticated: " + user.getEmail());
	} catch (OAuthRequestException e) {
	    resp.getWriter().println("Not authenticated: " + e.getMessage());

This servlet is incredibly simple. We retrieve an instance of OAuthService via OAuthServiceFactory and attempt to fetch the current user. Note that the User instance is the same kind of instance as a User returned by UserService. That’s because a User is still expected to sign in via a Google Account.

The method getCurrentUser() takes care of all of the OAuth signature verification. If something goes wrong – say, the request is not signed, or the signature is invalid, or the client’s timestamp is outside of the acceptable skew, or the nonce is repeated – OAuthService throws OAuthRequestException.

We can run this code locally, but it won’t work. When run locally, oauth.getCurrentUser() always returns a test user. Wel need to deploy it to App Engine before it’ll do verification. After deploy, we can test the servlet. I have the servlet mapped to /resource. When we browse to this URL, we see:

Not authenticated: Unknown

That’s okay. We expect to see this because we’re sending a vanilla GET to this API.

2. Add a new domain to your Google Account. Verify this domain

OAuth clients require a consumer key and consumer token. We need to generate these. Browse to the “Manage Domains” page:


It should look like this:

Add the base URL of our App Engine app into the text box in the “Add a New Domain” section and click “Add domain”. For instance, I entered: http://ikai-oauth.appspot.com.

We’ll be taken to a new page where we need to verify ownership of the application:

Download the HTML verification file and place it into our war directory. Deploy this new version of the application to App Engine. Once we have confirmed that the page is serving, click “Verify” to complete the verification process.

When we have verified our domain, we will be asked to accept the Terms of Service and enter a few settings. Only the authsub setting is required; we can enter anything we want here because we will not be using authsub. We will then be presented with an OAuth consumer key and OAuth consumer secret. The OAuth consumer key is simply the domain, whereas the consumer secret is an autogenerated shared secret that clients will be using.

Now we have these values, we can move on to step 3.

3. Connecting an OAuth client to make requests against your application

As of the time of this writing, App Engine only supports OAuth 1.0.

Below is a basic script that will do the 3-legged OAuth dance, cache access tokens locally and make API calls. To run this script, you will need to install the python-oauth2 library. If we have git installed, the commands to install the library on a *Nix like system are:

git clone https://github.com/simplegeo/python-oauth2.git
cd python-oauth2
sudo python setup.py install

This installs the oauth2 library into your Python install so you can import it when we need it.

Now we can run the script to make authenticated calls against our app. Note that we’ll want to substitute the consumer_secret and app_id values with values that map to your application ID and consumer secret:

import oauth2 as oauth
import urlparse
import os
import pickle

app_id = "your_app_id_here"
url = "http://%s.appspot.com/resource" % app_id

consumer_key = '%s.appspot.com' % app_id
consumer_secret = 'your_consumer_secret_here'

access_token_file = "token.dat"

request_token_url   = "https://%s.appspot.com/_ah/OAuthGetRequestToken" % app_id
authorize_url       = "https://%s.appspot.com/_ah/OAuthAuthorizeToken" % app_id
access_token_url    = "https://%s.appspot.com/_ah/OAuthGetAccessToken" % app_id

consumer = oauth.Consumer(consumer_key, consumer_secret)

if not os.path.exists(access_token_file):

    client = oauth.Client(consumer)

    # Step 1: Get a request token. This is a temporary token that is used for 
    # having the user authorize an access token and to sign the request to obtain 
    # said access token.

    resp, content = client.request(request_token_url, "GET")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])

    request_token = dict(urlparse.parse_qsl(content))

    print "Request Token:"
    print "    - oauth_token        = %s" % request_token['oauth_token']
    print "    - oauth_token_secret = %s" % request_token['oauth_token_secret']

    print "Go to the following link in your browser:"
    print "%s?oauth_token=%s" % (authorize_url, request_token['oauth_token'])

    # After the user has granted access to you, the consumer, the provider will
    # redirect you to whatever URL you have told them to redirect to. You can 
    # usually define this in the oauth_callback argument as well.
    accepted = 'n'
    while accepted.lower() == 'n':
            accepted = raw_input('Have you authorized me? (y/n) ')

    # Step 3: Once the consumer has redirected the user back to the oauth_callback
    # URL you can request the access token the user has approved. You use the 
    # request token to sign this request. After this is done you throw away the
    # request token and use the access token returned. You should store this 
    # access token somewhere safe, like a database, for future use.
    token = oauth.Token(request_token['oauth_token'],
    client = oauth.Client(consumer, token)

    resp, content = client.request(access_token_url, "POST")
    access_token = dict(urlparse.parse_qsl(content))

    print "Access Token:"
    print "    - oauth_token        = %s" % access_token['oauth_token']
    print "    - oauth_token_secret = %s" % access_token['oauth_token_secret']
    print "You may now access protected resources using the access tokens above." 

    token = oauth.Token(access_token['oauth_token'],

    with open(access_token_file, "w") as f:
        pickle.dump(token, f)

    with open(access_token_file, "r") as f:
        token = pickle.load(f)

client = oauth.Client(consumer, token)
resp, content = client.request(url, "GET")
print "Response Status Code: %s" % resp['status']
print "Response body: %s" % content

(The basis for this script was shamelessly stolen from Joe Stump’s sample oauth2 code for his Python library on Github.)

Once we run the script using:

python oauth_client.py

we should see:

Request Token:
- oauth_token        = SOME_OAUTH_REQUEST_TOKEN_VALUE
- oauth_token_secret = SOME_OAUTH_REQUEST_SECRET_VALUE

Go to the following link in your browser:

Have you authorized me? (y/n)

The OAuth token and token secret values are generated by the script using a combination of random values and the consumer key/secret pair. With these values, known as request tokens, you generate an authorization URL for an end user to bless our client so it can make OAuth requests on the behalf of the user that grants authorization.

At this point, the script pauses for input. As part of the OAuth dance, we need to browse to the URL provide and authorize the script. Copy/paste this URL into your browser window and click “Grant Access”:

Once we see a page that says:

You have successfully granted ikai-oauth.appspot.com access to your Google Account. You can revoke access at any time under ‘My Account’.

We can switch back to your terminal window and hit “y”. The client now exchanges our request tokens for access tokens. Access tokens are what you need to make API calls. The script outputs this:

Access Token:
- oauth_token        = SOME_OAUTH_ACCESS_TOKEN
- oauth_token_secret = SOME_OAUTH_ACCESS_TOKEN_SECRET

You may now access protected resources using the access tokens above.

Response Status Code: 200
Response body: Authenticated: the-account-you-logged-in-with@gmail.com

The Python script caches the access token in a file called token.dat, so the next time we run oauth_client.py, we skip the authorization dance and can directly make API calls:

$ python oauth_client.py
Response Status Code: 200
Response body: Authenticated:the-account-you-logged-in-with@gmail.com

That’s all there is to it!

Final notes and general tips

Setting up an OAuth provider using App Engine’s API is incredibly simple once we know all the steps. Setting up the provider is just a matter of a few lines of code, and the steps to set up the client are pretty straightforward. The most difficult part is setting up the consumer key and secret, but even that isn’t so bad once we know where the management interface is.

When possible, use OAuth instead of ClientLogin. This goes for web applications, mobile applications, desktop apps, and even command line scripts. OAuth allows users to revoke your access token and trains users not to arbitrarily give out their Google Account password to any interface that asks for it. For building clients, it also gives you a way to do client authentication without having to cache credentials – using ClientLogin too often results in CaptchaRequiredException being thrown, anyway.

– Ikai


Github sample code:

App Engine/Java OAuth docs: http://code.google.com/appengine/docs/java/oauth/overview.html

Domain management – get your consumer key/secret here: https://www.google.com/accounts/ManageDomains

Python OAuth client code: https://github.com/simplegeo/python-oauth2

Written by Ikai Lan

May 26, 2011 at 5:23 pm