Ikai Says…

Debugging your Google OAuth 2.0 token when you get HTTP 401s or 403s

Ikai Lan — Fri, 19 Jul 2013 10:07:00 +0000

One of the things I get asked about the most is OAuth 2.0 when developers start seeing 401s, 403s, and possibly other HTTP 4xx status codes. This post isn’t meant to be a comprehensive guide on OAuth debugging for Google/YouTube APIs. Rather, it’s a collection of some of the steps I find myself recommending or repeating when I’m trying to debug issues with OAuth authorization.

I’ve recorded a short video describing how to enable Google API access for use with web and installed apps:

The checklist is:

Did you create a new API project?
Did you enable the APIs you are looking to use?
Did you create a client ID and client secret?

A common step for developers is that they forget step #2 – enabling the APIs.

Getting a token

Here are my cliff notes on tokens:

Access tokens – used to make API calls. These expire after an hour
Refresh tokens – you only get these if you request offline access when you ask a user to authorize. These are exchanged by your client for access tokens. Refresh tokens generally don’t expire.

What causes 401s and other 4xx status codes?

The common causes for “401 Unauthorized” when making API calls with an access token are:

expired access token (most common)
Developer accidentally disabled the APIs (uncommon)
User revokes token (rare)

Sometimes, more explanation exists in the response body of a HTTP 4xx. In the Java client, for example, you should log the error, because it will assist in troubleshooting:

try {   
       // Make your Google API call
} catch (GoogleJsonResponseException e) {
      GoogleJsonError error = e.getDetails();
      // Print out the message and errors
}

Different versions will have different API signatures. Here’s a link to the current version’s Javadocs (1.1.15) – newer versions look like they might be deviating from this a bit.

Troubleshooting the token

You could take your existing code and make an API call here whenever you get a HTTP 4xx and log that response. This’ll return some useful information:

When the token expires
What’s the token’s scope (this is important)
If the token is invalid

If you didn’t write the code yourself and inherited it:

Whether this access token came from an offline refresh_token or not (the “offline” field)

If the token is invalid … well, that doesn’t help a lot. I would troubleshoot like this:

Remove the access token from your datastore or database.
Use the refresh token to acquire a new access token (if you are using a refresh token)
Try to make the API call again. If it works, you’re good! If not …
Check the access token against the tokenInfo API
If it’s still invalid, do a full reauth

Hope this helps!

Clearing up some things about LinkedIn mobile’s move from Rails to node.js

Ikai Lan — Thu, 04 Oct 2012 10:02:00 +0000

There’s an article on highscalability that’s talking about the move from Rails to node.js (for completeness: its sister discussion on Hacker News). It’s not the first time this information has been posted. I’ve kind of ignored it for now (because I didn’t want to be this guy), but it’s come up enough times and no one has spoken up, so I suppose it’s up to me to clear a few things up.

I was on the team at LinkedIn that was responsible for the mobile server, and while I wasn’t the primary contributor to that stack, I built and contributed several things, such as the unfortunate LinkedIn WebOS app which made use of the mobile server (and a few features) and much of the initial research behind productionizing JRuby for web applications (I did much more stuff that wasn’t published). I left LinkedIn in 2009, so I apologize if any new information has surfaced. My hunch is that even if I’m off, I’m not off by that much.

Basically: the article is leaving out several facts. We can all learn something from the mobile server and software engineering if we know the full story behind the whole thing.

In 2008, I joined a software engineering team that LinkedIn that was focused on building things outside the standard Java stack. You see, back then, to develop code for linkedin.com, you needed a Mac Pro with 6gigs of RAM just to run your code. And those requirements kept growing. If my calculations are correct, the standard setup for engineers now is a machine with 20 or more gigabytes of RAM just to RUN the software. In addition, each team could only release once every 6 weeks (this has been fixed in the last few years). It was deemed that we needed to build out a platform off the then-fledgling API and start creating teams to get get off the 6 week release cycle so we could iterate quickly on new features. The team I was on, LED, was created for this purpose.

Our first projects was a rotating globe that showed off new members joining LinkedIn. It used to run Poly9, but when they got shut down, it looks like someone migrated it to use Google Earth. The second major project was m.linkedin.com, a mobile web client for LinkedIn that would be one of the major clients of our fledgling API server, codenamed PAL. Given that we were building out an API for third parties, we figured that we could eat our own dogfood and build out LinkedIn for mobile phones with browsers. This is 2008, mind you. The iPhone just came out, and it was a very Blackberry world.

The stack we chose was Ruby on Rails 1.2, and the deployment technology was Mongrel. Remember, this is 2008. Mongrel was cutting edge Ruby technology. Phusion Passenger wasn’t released yet (more on this later), and Mongrel was light-years ahead of FastCGI. The problem with Mongrel? It’s single-threaded. It was deemed that the cost of shipping fast was more important than CPU efficiency, a choice I agreed with. We were one of the first products at LinkedIn to do i18n (well, we only did translations) via gettext. We deployed using Capistrano, and were the first ones to use nginx. We did a lot of other cool stuff, like experiment with Redis, learn a lot about memcached in production (nowadays this is a given, but there was a lot of memcached vs EHCache talk back then). Etc, etc. But I’m not trying to talk about how much fun I had on that team. Well, not primarily.

I’m here to clear up facts about the post about moving to node.js. And to do that, I’m going to back to my story.

The iPhone SDK had shipped around that time. We didn’t have an app ready for launch, but we wanted to build one, so our team did, and we inadvertantly became the mobile team. So suddenly, we decided that this array of Rails server that made API calls to PAL (which was, back then, using a pre-OAuth token exchange technology that was strikingly similar) would also be the primary API server for the iPhone client and any other rich mobile client we’d end up building, this thing that was basically using Rails rhtml templates. We upgraded to Rails 2.x+ so we could have the respond_to directive for different outputs. Why didn’t we connect the iPhone client directly to PAL? I don’t remember. Oh, and we also decided to use OAuth for authenticating the iPhone client. Three legged OAuth, so we also turned those Rails servers into OAuth providers. Why did we use 3-legged OAuth? Simple: we had no idea what we were doing. I’LL ADMIT IT.

Did I mention that we hosted outside the main data centers? This is what Joyent talks about when they say they supplied LinkedIn with hosting. They never hosted linkedin.com proper on Joyent, but we had a long provisioning process for getting servers in the primary data center, and there were these insane rules about no scripting languages in production, so we decided it was easier to adopt an outside provider when we needed more capacity.

Here’s what you were accessing if you were using the LinkedIn iPhone client:

iPhone -> m.linkedin.com (running on Rails) -> LinkedIn’s API (which, for all intents and purposes, only had one client, us)

That’s a cross data center request, guys. Running on single-threaded Rails servers (every request blocked the entire process), running Mongrel, leaking memory like a sieve (this was mostly the fault of gettext). The Rails server did some stuff, like translations, and transformation of XML to JSON, and we tested out some new mobile-only features on it, but beyond that it didn’t do a lot. It was a little more than a proxy. A proxy with a maximum concurrency factor dependent on how many single-threaded Mongrel servers we were running. The Mongrel(s), we affectionately referred to them, often bloated up to 300mb of RAM each, so we couldn’t run many of them.

At this time, I was busy productionizing JRuby. JRuby, you see was taking full advantage of Rails’ ability to serve concurrent requests using JVM concurrency. In addition, JRuby outperformed MRI in almost every real benchmark I threw at it – there were maybe 1 or 2 specific benchmarks when it didn’t. I knew that if we ported the mobile server to JRuby, we could have gotten more performance and gotten way more concurrency. We would have kept the same ability to deploy fast with the option to in-line into many of the Java libraries LinkedIn was using.

But we didn’t. Instead, the engineering manager at the time ruled in favor of Phusion Passenger, which, to be fair, was an easier port than JRuby. We had come to depend on various native extensions, gettext being the key one, and we didn’t have time to port the translations to something that was JRuby friendly. I was furious, of course, because I had been evangelizing JRuby as the best Ruby production environment and no one was listening, but that’s a different story for a different time. Well, maybe some people listened; those Square guys come to mind.

This was about the time I left LinkedIn. As far as I know, they didn’t build a ton more features. Someone told me that one of my old teammates suddenly became fascinated with node.js, and pretty much singlehandedly decided to rewrite the mobile server using node. Node was definitely a better fit for what we were doing, since we were constantly blocking on a cross data center call, and non blocking server for IO has been shown to be highly advantageous from a performance perspective. Not to mention: we never intended for the original Ruby on Rails server to be used as a proxy for several years.

So, knowing all the facts, what are all the takeaways?

Is v8 faster than MRI? MRI is generally slower than YARV (Ruby 1.9), and, at least in these benchmarks, I don’t think there is any question that v8 is freakin’ fast. If node.js blocked on I/O, however, this fact would have been almost completely irrelevant.
The rewrite factor. How many of us have been on a software engineering project where the end result looking nothing like what we planned to build in the first place? And, knowing fully the requirements, we know that, if given time and the opportunity to rebuild it from scratch, it would have been way better? Not to mention: I grew a lot at LinkedIn as a software engineer, so the same me several years later would have done a far better job than the same me in 2008. Experience does matter.
I see that one of the advantages of the mobile server being in node.js is people could “leverage” (LinkedIn loves that word) their Javascript skills. Well, LinkedIn had/has hundreds of Java engineers! If that was a concern, we would have spent more time exploring Netty. Lies, damn lies, and benchmarks, I always say, but I think it’s safe for us to say that Netty (this is vertx, which sits on top of Netty) is at least as fast as node.js for web serving.
Firefighting? That was probably a combination of several things: the fact that we were running MRI and leaked memory, or the fact that the ops team was 30% of a single guy.

What I’m saying here is use your brain. Don’t read the High Scalability post and assume that you must build your next technology using node.js. It was definitely a better fit than Ruby on Rails for what the mobile server ended up doing, but it is not a performance panacea. You’re comparing a lower level server to a full stack web framework.

That’s all for tonight, folks, and thank you internet for finally goading me out of hiding again.

– Ikai

Getting started with jOOQ: A Tutorial

Ikai Lan — Tue, 01 Nov 2011 19:56:00 +0000

Introduction

I accidentally stumbled onto jOOQ a few days ago while doing a lot of research on Hibernate. Funny how things work, isn’t it? For those of you that aren’t familiar with it, jOOQ is a different approach to the over-ORMing of Java persistence. Rather than try to map database tables to Java classes and abstract away the SQL underneath, jOOQ assumes you want low level control over the SQL queries you execute, and provides a mostly typesafe interface for executing queries. I don’t have anything against simple ORMs, but it’s good to have the right tool for the right job. From the jOOQ homepage:

Instead of this SQL query:

SELECT * FROM BOOK
   WHERE PUBLISHED_IN = 2011
ORDER BY TITLE

You would execute this Java code:

create.selectFrom(BOOK)
      .where(PUBLISHED_IN.equal(2011))
      .orderBy(TITLE)

Why a Java interface? Type safety, for one. Programmatically using jOOQ’s DSL has some advantages over writing SQL queries by hand, such as IDE support and compile time checking of some things.

The idea interested me and I dug in. Unfortunately, the jOOQ site’s documentation, while fairly comprehensive, DO NOT PROVIDE AN END TO END “GETTING STARTED” PAGE!!! This means that if you want to learn jOOQ, you’ll have to jump to the chapter about Meta model code generation, then jump to the DSL, then jump to jOOQ classes section. It’s a bit of a mess for new users. Google search also didn’t turn up many useful results, so I figured I’d whip up a quick “Getting started” guide. We’re going to go over the following steps:

Preparation: Download jOOQ and your SQL driver
Step 1: Create a SQL database and a table
Step 2: Generate classes
Step 3. Write a main class and establish MySQL connection
Step 4: Write a query using jOOQ’s DSL
Step 5: Iterate over results
Step 6: Profit!

Ready? Let’s get started.

Setting up an OAuth provider on Google App Engine

Ikai Lan — Thu, 26 May 2011 10:09:00 +0000

App Engine provides an API for easily creating an OAuth provider. In this blog post, I’ll describe the following steps:

Create and deploy an App Engine application the implements the OAuth API
Add a new domain to your Google Account. Verify this domain.
Connecting an OAuth client to make requests against your application

I’ll avoid a deep explanation of OAuth for now. We can find everything you need to know about OAuth in the Beginner’s guide to OAuth.

Get the code

The code that goes along with this blog post is available here:

https://github.com/ikai/appengine-oauth-java-server-python-client-sample

The two most important files are:

python/oauth_client.py
src/com/ikai/oauthprovider/ProtectedServlet.java

App Engine datastore tip: monotonically increasing values are bad

Ikai Lan — Tue, 25 Jan 2011 10:00:00 +0000

When saving entities to App Engine’s datastore at a high write rate, avoid monotonically increasing values such as timestamps. Generally speaking, you don’t have to worry about this sort of thing until your application hits 100s of queries per second. Once you’re in that ballpark, you may want to examine potential hotspots in your application that can increase datastore latency.

To explain why this is, let’s examine what happens to the underlying Bigtable of an application with a high write rate. When a Bigtable tablet, a contiguous unit of storage, experiences a high write rate, the tablet will have to “split” into more than one tablet. This “split” allows new writes to shard. Here’s a visual approximation of what happens:

There’s a moment of pain – this is one of the causes of datastore timeouts in high write applications, as discussed in Nick Johnson‘s article, “Handling Datastore Errors“.

Remember that for indexed values, we must write corresponding index rows. When values are randomly or even semi-randomly distributed, like, say, user email addresses, tablet splits function well. This is because the work to write multiple values is distributed amongst several Bigtable tablets:

The problems appear when we start saving monotonically increasing values like timestamps, or insert dictionary words in alphabetical order:

The new writes aren’t evenly distributed, and whichever tablet they end up going to end up becoming a new hot tablet in need of a split.

As a developer, what can you do to avoid this situation?

Avoid indexes unless you need to query against the values. No index = no hot tablet on increasing value
Lower your write rate, or figure out how to better distribute values. A pure random distribution is best, but even a distribution that isn’t random will be better than a predictable, monotonically increasing value
Prefix a shard identifier to your value. This is problematic if you plan on doing queries, as you will need to prefix and unprefix the values, then join the results in memory – but it will reduce the error rate of your writes

The tips are applicable whether you are on Master-Slave or High Replication datastore. And one more tip: don’t prematurely optimize for this case, since chances are, you won’t run into it. You can be spending that time working on features.

– Ikai

P.S. Yes, I drew those doodles. No, I do not have any formal art training (how could you tell?!)

GWT, Blobstore, the new high performance image serving API, and cute dogs on office chairs

Ikai Lan — Wed, 08 Sep 2010 20:01:00 +0000

I’ve been working on an image sharing application using GWT and App Engine to familiarize myself with the newer aspects of GWT. The project and code are here:

http://ikai-photoshare.appspot.com
http://github.com/ikai/gwt-gae-image-gallery

(Please excuse spaghetti code in client side GWT code, much of it was me feeling my way around GWT. I’ve come to appreciate GWT quite a bit in spite of the fact that I’m pretty familiar with client side development; I’ll write about this in a future post).

The 1.3.6 release of the App Engine SDK shipped with a high performance image serving API. What this means is that a developer can take a blob key pointing to image data stored in the blobstore and call getServingUrl() to create a special URL for serving the image. What are the benefits to using this API?

You don’t have to write your own handler for uploaded images
You don’t have to consume storage quota for saving resized or cropped images, as you can perform transforms on the image simply by appending URL parameters. You only need to store the final URL that is generated by getServingUrl().
You aren’t charged for datastore CPU for fetching the image (you will still be billed for bandwidth)
Images are, in general, served from edge server locations which can be geographically located closer to the user

There are a few drawbacks, however, to using the API:

There aren’t any great schemes for access control of the images, and if someone has the URL for a thumbnail, they can easily remove the parameters to see a larger image
Billing must be enabled – you will only be charged for usage, however, so you don’t have to spend a cent to use the API. You just have to have billing active.
Deleting an image blob doesn’t delete the image being served from the URL right away – that image will still be available for some time
Images must be uploaded to the blobstore, not the datastore as a blob, so it’s important to understand how the blobstore API works
The URLs of the created images are really, really ugly. If you need pretty URLs, it’s probably a better pattern to create a URL mapping to an HTML page that just displays the image in an IMG tag

Blobstore crash course

It’ll be best if we gave a quick refresher course on the blobstore before we begin. Here’s the standard flow for a blobstore upload:

Create a new blobstore session and generate an upload URL for a form to POST to. This is done using the createUploadUrl() method of BlobstoreService. Pass a callback URL to this method. This URL is where the user will be forwarded after the upload has completed.
Present an upload form to the user. The action is the URL generated in step 1. Each URL must be unique: you cannot use the same URL for multiple sessions, as this will cause an error.
After the URL has uploaded the file, the user is forwarded to the callback URL in your App Engine application specified in step 1. The key of the uploaded blob, a String blob key, is passed as an URL parameter. Save this URL and pass the user to their final destination

Got it? Now we can talk about image serving.

Using the image serving URL

Once we have a blob key (step 3 of a Blobstore upload), we can do interesting things with it. First, we’ll need to create an instance of the ImagesService:

ImagesService imagesService = ImagesServiceFactory.getImagesService();

Once we have an instance, we pass the blob key to getServingUrl and get back a URL:

String imageUrl = imagesService.getServingUrl(blobKey);

This can sometimes take several hundred milliseconds to a few seconds to generate, so it’s almost always a good idea to run this on write as opposed to first read. Subsequent calls should be faster, but they may not be as fast as reading this value from a datastore entity property or memcache. Since this value doesn’t change, it’s a good idea to store it. On the local dev server, this URL looks something like this:

/_ah/img/eq871HJL_bYxhWQbTeYYoA

In production, however, this will return a URL that looks like this:

http://lh5.ggpht.com/2PQk0vDo8Bn8oiPba2gtGlDfd1ciD0H0MLrixcT12FCDQEm2oyMW9ErJX_-ZzOHBWbYBKzevK0BY6cxdZ3cxf_37

(Cute dogs below)

You’ve already saved yourself the trouble of writing a handler. What’s really nice about this URL is that you can perform operations on it just by appending parameters. Let’s say we wanted to crop our image to be no larger than 200×200, yet retain scale. We’d simply append “=s200” to the end of the image:

http://lh5.ggpht.com/2PQk0vDo8Bn8oiPba2gtGlDfd1ciD0H0MLrixcT12FCDQEm2oyMW9ErJX_-ZzOHBWbYBKzevK0BY6cxdZ3cxf_37=s144

(Looks like this)

We can also crop the image by appending a “-c” to the size parameter:

http://lh5.ggpht.com/2PQk0vDo8Bn8oiPba2gtGlDfd1ciD0H0MLrixcT12FCDQEm2oyMW9ErJX_-ZzOHBWbYBKzevK0BY6cxdZ3cxf_37=s144-c

(Looks like this – compare with above)

Note that we can also generate these URLs programmatically using the overloaded version of getServingUrl that also accepts a size and crop parameter.

Adding GWT

So now that we’ve got all that done, let’s get it working with GWT. It’s important that we understand how it all works, because GWT’s single-page, Javascript-generated content model must be taken into account. Let’s draw our upload widget. We’ll be using UiBinder:

We’ll create our Composite class as follows:

public class UploadPhoto extends Composite {

    private static UploadPhotoUiBinder uiBinder = GWT.create(UploadPhotoUiBinder.class);

    UserImageServiceAsync userImageService = GWT.create(UserImageService.class);

    interface UploadPhotoUiBinder extends UiBinder {}

    @UiField
    Button uploadButton;

    @UiField
    FormPanel uploadForm;

    @UiField
    FileUpload uploadField;

    public UploadPhoto(final LoginInfo loginInfo) {
        initWidget(uiBinder.createAndBindUi(this));
    }

}

Here’s the corresponding XML file:

(We’ll add more to this later)

When we discussed the Blobstore, we mentioned that each upload form has a different POST location corresponding to the upload session. We’ll have to add a GWT-RPC component to generate and return a URL. Let’s do that now:

// UserImageService.java
@RemoteServiceRelativePath("images")
public interface UserImageService extends RemoteService  {
    public String getBlobstoreUploadUrl();
}

Our IDE will nag us to generate the corresponding Async interface if we have a GWT plugin:

// UserImageServiceAsync.java
public interface UserImageServiceAsync {
    public void getBlobstoreUploadUrl(AsyncCallback callback);
}

We’ll need to write the code on the server side:

// UserImageServiceImpl.java
@SuppressWarnings("serial")
public class UserImageServiceImpl extends RemoteServiceServlet implements UserImageService {

    @Override
    public String getBlobstoreUploadUrl() {
        BlobstoreService blobstoreService = BlobstoreServiceFactory.getBlobstoreService();
        return blobstoreService.createUploadUrl("/upload");
    }

}

This is pretty straightforward. We’ll want to invoke this service on the client side when we build the form. Let’s add this to UploadPhoto:

public class UploadPhoto extends Composite {

private static UploadPhotoUiBinder uiBinder = GWT.create(UploadPhotoUiBinder.class);
UserImageServiceAsync userImageService = GWT.create(UserImageService.class);

interface UploadPhotoUiBinder extends UiBinder {}

    @UiField
    Button uploadButton;

    @UiField
    FormPanel uploadForm;

    @UiField
    FileUpload uploadField;

    public UploadPhoto() {
        initWidget(uiBinder.createAndBindUi(this));

        // Disable the button until we get the URL to POST to
        uploadButton.setText("Loading...");
        uploadForm.setEncoding(FormPanel.ENCODING_MULTIPART);
        uploadForm.setMethod(FormPanel.METHOD_POST);
        uploadButton.setEnabled(false);
        uploadField.setName("image");

        // Now we use out GWT-RPC service and get an URL
        startNewBlobstoreSession();

        // Once we've hit submit and it's complete, let's set the form to a new session.
        // We could also have probably done this on the onClick handler
        uploadForm.addSubmitCompleteHandler(new FormPanel.SubmitCompleteHandler() {

            @Override
            public void onSubmitComplete(SubmitCompleteEvent event) {
                uploadForm.reset();
               startNewBlobstoreSession();
            }
        });
    }

    private void startNewBlobstoreSession() {
        userImageService.getBlobstoreUploadUrl(new AsyncCallback() {

            @Override
            public void onSuccess(String result) {
                uploadForm.setAction(result);
                uploadButton.setText("Upload");
                uploadButton.setEnabled(true);
            }

            @Override
            public void onFailure(Throwable caught) {
                // We probably want to do something here
            }
        });
    }

    @UiHandler("uploadButton")
    void onSubmit(ClickEvent e) {
        uploadForm.submit();
    }

}

This is fairly standard GWT RPC.

So that concludes the GWT part of it. We mentioned an upload callback. Let’s implement that now:

/**
 * @author Ikai Lan
 * 
 *         This is the servlet that handles the callback after the blobstore
 *         upload has completed. After the blobstore handler completes, it POSTs
 *         to the callback URL, which must return a redirect. We redirect to the
 *         GET portion of this servlet which sends back a key. GWT needs this
 *         Key to make another request to get the image serving URL. This adds
 *         an extra request, but the reason we do this is so that GWT has a Key
 *         to work with to manage the Image object. Note the content-type. We
 *         *need* to set this to get this to work. On the GWT side, we'll take
 *         this and show the image that was uploaded.
 * 
 */
@SuppressWarnings("serial")
public class UploadServlet extends HttpServlet {
	private static final Logger log = Logger.getLogger(UploadServlet.class
			.getName());

	private BlobstoreService blobstoreService = BlobstoreServiceFactory
			.getBlobstoreService();

	public void doPost(HttpServletRequest req, HttpServletResponse res)
			throws ServletException, IOException {

		Map blobs = blobstoreService.getUploadedBlobs(req);
		BlobKey blobKey = blobs.get("image");

		if (blobKey == null) {
			// Uh ... something went really wrong here
		} else {

			ImagesService imagesService = ImagesServiceFactory
					.getImagesService();

			// Get the image serving URL
			String imageUrl = imagesService.getServingUrl(blobKey);

			// For the sake of clarity, we'll use low-level entities
			Entity uploadedImage = new Entity("UploadedImage");
			uploadedImage.setProperty("blobKey", blobKey);
			uploadedImage.setProperty(UploadedImage.CREATED_AT, new Date());

			// Highly unlikely we'll ever filter on this property
			uploadedImage.setUnindexedProperty(UploadedImage.SERVING_URL,
					imageUrl);

			DatastoreService datastore = DatastoreServiceFactory
					.getDatastoreService();
			datastore.put(uploadedImage);

			res.sendRedirect("/upload?imageUrl=" + imageUrl);
		}
	}

	@Override
	protected void doGet(HttpServletRequest req, HttpServletResponse resp)
			throws ServletException, IOException {

		String imageUrl = req.getParameter("imageUrl");
		resp.setHeader("Content-Type", "text/html");

		// This is a bit hacky, but it'll work. We'll use this key in an Async
		// service to
		// fetch the image and image information
		resp.getWriter().println(imageUrl);

	}
}

We’ll probably want to display the image we just uploaded in the client. Let’s add a line line of code to register a SubmitCompleteHandler to do this:

	public void onSubmitComplete(SubmitCompleteEvent event) {
		uploadForm.reset();
		startNewBlobstoreSession();

		// This is what gets the result back - the content-type *must* be
		// text-html
		String imageUrl = event.getResults();
		Image image = new Image();
		image.setUrl(imageUrl);

		final PopupPanel imagePopup = new PopupPanel(true);
		imagePopup.setWidget(image);

		// Add some effects
		imagePopup.setAnimationEnabled(true); // animate opening the image
		imagePopup.setGlassEnabled(true); // darken everything under the image
		imagePopup.setAutoHideEnabled(true); // close image when the user clicks
												// outside it
		imagePopup.center(); // center the image

	}

And we’re done!

Get the code

I’ve got the code for this project here:

http://github.com/ikai/gwt-gae-image-gallery

Just a warning, this is a bit different from the sample code above. I wrote this post after I wrote the code, extrapolating the bare minimum to make this work. The sample code above has experimental tagging, delete and catches logins. I’m adding features to it simply to see what else can be done, so expect changes. I’m aware of a few of the bugs with the code, and I’ll get around to fixing them, but again, it’s a demo project, so keep realistic expectations. As far as I can tell, however, the code above should be runnable locally and deployable (once you have enabled billing for blobstore).

Happy developing!

Using the Java Mapper Framework for App Engine

Ikai Lan — Fri, 09 Jul 2010 10:11:00 +0000

The recently released Mapper framework is the first part of App Engine’s mapreduce offering. In this post, we’ll be discussing some of the types of operations we can perform using this framework and how easily they can be done.

Introduction to Map Reduce

If you aren’t familiar with Map Reduce, read more about it from a high level from Wikipedia here. The official paper can be downloaded from this site if you’re interested in a more technical discussion.

The simplest breakdown of MapReduce is as follows:

Take a large dataset and break it into pieces, mapping individual pieces of data
Work on those mapped datasets and reduce them into the form you need

A simple example here is full text indexing. Suppose we wanted create indexes from existing text documents. We would use the Map step to iterate over every document and “map” each phrase or term to a document, then we would “reduce” the mappings by writing them to an index. Map/reduce problems have the advantage of not only being easy to conceptualize as problems that can be distributed and parallelized, but also because there are frameworks that support many of the administrative functions of map-reduce: failure recovery, distribution of work, tracking status of jobs, reporting and so forth. The appengine-mapreduce project seeks to provide as many of these features as possible while making it as easy as possible for developers to write large batch processing jobs without having to think about the plumbing details.

But I only have Map available!

Yes, this is true – as of the writing of this post, only the “map” step exists, hence why it’s currently referred to as the “Mapper API”. That doesn’t mean it’s not useful. For starters, it is a very easy way to perform some operation on every single Entity of a given Kind in your datastore in parallel. What would you have to build for yourself if Mapper weren’t available?

Begin querying over every Entity in chained Task Queues
Store beginning and end cursors (introduced in 1.3.5)
Create tasks to work with chunks of your datastore
Write the code to manipulate your data
Build an interface to control your batch jobs
Build a callback system for your multitudes of parallelized workers to call when the entire task has completed

It’s certainly not a trivial amount of work. Some things you can do very easily with the Mapper library include:

Modify some property or set of properties for every Entity of a given Kind
Delete all entities of a single Kind – the functional equivalent of a “DROP TABLE” if you were using a relational database
Count the occurrences of some property across every single Entity of a given Kind in your datastore

We’ll go through a few of these examples in this post.

Our Sample application

Our sample application will be a modified version of the Guestbook demo. We’ll add a few additional properties. For simplicity, we’ll use the low-level API, since the Mapper API also uses the low-level API. You can see this application here:http://ikai-mapper-demo.appspot.comThe code is also available to clone via Github if you’d like to follow along.

How to define a Mapper

There are three steps to defining a Mapper:

Download, build and place the appengine-mapreduce JAR files in your WEB-INF/lib directory and add them to your build path. You only need to do this once per project. The steps for doing this are on the “Getting Started” page for Java. You’ll need all the JAR files that are built.
Make sure that we have a DESCENDING index created on Key. This is important! If we run our Mapper locally, this’ll automatically be created in our datastore-indexes.xml file when we deploy our application. One trick to ensure that indexes get built before they are needed, at least in a live application, is to create and deploy an application with the new index configuration to a non-default version. Because all versions use the same datastore and the same set of indexes, this will schedule the index to be built before we need it in the live version. When it has completed, we simply switch the default version over, and we’re ready to roll.
Create your Mapper class
Configure your Mapper class in mapreduce.xml

We’ll go over steps 3 and 4 in each example.

Example 1: Changing a property on every Entity (Naive way)

(You can even use this technique if you just need to change a property on a large set of Entities).

Assuming you’ve already set up your environment for the Mapper servlet, you can dive right in. Let’s create a Mapper classes that goes through every Entity of a given Kind and converts the “comment” property to use all lowercase letters. We’ll also add a timestamp for when we modified this Entity. In this first example, we’ll do this the naive way. This is a very good way to introduce you to very simple mutations on all your Entities using Mapper.

Note that this requires some familiarity with the Low-Level API. Don’t worry – entities edited or saved using the low-level API are accessible via managed persistence interface such as JDO/JPA (and vice versa). If you aren’t familiar with the low-level API, you can read more about it here on the Javadocs.

The first thing we’ll have to do is define a Mapper. We tried as much as possible to mimic Hadoop’s Mapper class. We’ll be subclassing AppEngineMapper, which is itself a subclass of Hadoop’s Mapper. The meat of this class is the map() method, which we’ll be overriding. We’ll also override the taskSetup() lifecycle callback. We’ll be using this to initialize our DatastoreService, though we could probably initialize it in the body of the map() method itself. The other methods are taskCleanup(), setup() and cleanup() – examples here. Let’s have a look at our code below:

package com.ikai.mapperdemo.mappers;
 
import java.util.Date;
import java.util.logging.Logger;
 
import org.apache.hadoop.io.NullWritable;
 
import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.tools.mapreduce.AppEngineMapper;
 
/**
 *
 * This mapper changes all Strings to lowercase Strings, sets
 * a timestamp, and reputs them into the Datastore. The reason
 * this is a "Naive" Mapper is because it doesn't make use of
 * Mutation Pools, which can do these operations in batch instead
 * of individually.
 *
 * @author Ikai Lan
 *
 */
public class NaiveToLowercaseMapper extends
        AppEngineMapper {
    private static final Logger log = Logger
            .getLogger(NaiveToLowercaseMapper.class.getName());
 
    private DatastoreService datastore;
 
    @Override
    public void taskSetup(Context context) {
        this.datastore = DatastoreServiceFactory.getDatastoreService();
    }
 
    @Override
    public void map(Key key, Entity value, Context context) {
        log.info("Mapping key: " + key);
 
        if (value.hasProperty("comment")) {
            String comment = (String) value.getProperty("comment");
            comment = comment.toLowerCase();
            value.setProperty("comment", comment);
            value.setProperty("updatedAt", new Date());
 
            datastore.put(value);
 
        }
    }
}

Notice that this map method takes 3 parameters:

Key key – this is the datastore Key for the Entity we are about to perform an operation on. Mostly this exists for API compatibility with Hadoop, but we don’t really need it yet. For iterating over datastore Entities, we don’t really need this, because we *could* use this to look up the Entity, but we don’t have to because …

Entity value – … because we actually get the Entity already. If we did a lookup for the Entity, we’d double the amount of lookups we do per Entity. We can certainly use the Key to do a lookup using a PersistenceManager or EntityManager and have a populated, typesafe Entity object, but from an efficiency standpoint we’d be doubling our work for some JDO/JPA sugar.

Context context – We don’t need this in our example, but it’s easy to think of the Context as giving us access to “global” values such as temporary variables and configuration files. For a later example in this post, we’ll be using the Context to store a global value in a counter and increment it. For this example, it’s unused.

If you’re familiar at all with the low-level API, this will look very straightfoward (again, I highly encourage you to read the docs). We take an entity, add 2 properties to it, then re-put() the Entity back into the datastore.

Now let’s add this job to mapreduce.xml:


  
    
      mapreduce.map.class
 
      
      com.ikai.mapperdemo.mappers.NaiveToLowercaseMapper
    
 
    
    
      mapreduce.inputformat.class
      com.google.appengine.tools.mapreduce.DatastoreInputFormat
    
 
    
      mapreduce.mapper.inputformat.datastoreinputformat.entitykind
      Comment

It looks complex, but it’s really not. We define a configuration element and name the job. The name of the job is also the name we’ll see in the GUI when we fire off the job. We need 3 sets of property elements under this element, which are just name/value pairs. Let’s go over each one we used:

Name: mapreduce.map.class
Value: com.ikai.mapperdemo.mappers.NaiveToLowercaseMapper
This one is straightforward – we provide the name of an AppEngineMapper subclass with the map() method we want run.

Name: mapreduce.inputformat.class
Value: com.google.appengine.tools.mapreduce.DatastoreInputFormat
This is a class that takes some input to map over. DatastoreInputFormat is provided by appengine-mapreduce, but it is possible for us to define our own input formatter. For guidance, check out the source of DatastoreInputFormat here.

In a more advanced example (ahem, future blog post), we’ll discuss building our own InputFormat to read from another source such as the Blobstore. For our examples in this post, we won’t need anything beyond DatastoreInputFormat.

Name: mapreduce.mapper.inputformat.datastoreinputformat.entitykind
Value: Comment
This input is specific to DatastoreInputFormat. It tells DatastoreInputFormat which Entity Kind to iterate over. Note that in the mapper console, a user can type in the name of a Kind or edit this Field to reflect the value they want. We can’t leave this blank, though, if we want this to work.

“Running jobs” appears when we click “Run”. We can click “Detail” to see the progress of our job, or we can “Abort” to quit the job. Note that aborting a job won’t revert our Entities! We’ll end up with a partially run job if we run a giant mutation, so we’ll have to be cognizant of this when we use this tool.

When the job completes, we’ll take a look at our Comments. Sure enough, they are now all lowercase.

Example 2: Changing a property on every Entity using Mutation Pools

There’s a reason the Mapper in Example 1 is called a Naive Mapper: because it doesn’t take advantage of mutation pools. As we all know, App Engine’s datastore is capable of handling operations in parallel using batched calls. We’re already doing work in parallel by specifying shards, but we’ll want to use batched calls when possible. We do this by adding the mutations we want to a mutation pool, then, periodically as the pool hits a certain size, we flush all the writes to the datastore with a single call instead of individually. This has the advantage of making our map() call as fast as possible, since all we’re really doing is making a list of operations to perform all at once when the system is good and ready. Let’s define the XML file first assuming we call the class PooledToLowercaseMapper:


  
    mapreduce.map.class
 
    
    com.ikai.mapperdemo.mappers.PooledToLowercaseMapper
  
 
  
  
    mapreduce.inputformat.class
    com.google.appengine.tools.mapreduce.DatastoreInputFormat
  
 
  
    mapreduce.mapper.inputformat.datastoreinputformat.entitykind
    Comment

It looks almost exactly the same. That’s because the meat is in what we do in the actually class itself:

package com.ikai.mapperdemo.mappers;
 
import java.util.Date;
import java.util.logging.Logger;
 
import org.apache.hadoop.io.NullWritable;
 
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.tools.mapreduce.AppEngineMapper;
import com.google.appengine.tools.mapreduce.DatastoreMutationPool;
 
/**
 *
 * The functionality of this is exactly the same as in {@link NaiveToLowercaseMapper}.
 * The advantage here is that since a {@link DatastoreMutationPool} is used, mutations
 * can be done in batch, saving API calls.
 *
 * @author Ikai Lan
 *
 */
public class PooledToLowercaseMapper extends
        AppEngineMapper {
    private static final Logger log = Logger
            .getLogger(PooledToLowercaseMapper.class.getName());
 
    @Override
    public void map(Key key, Entity value, Context context) {
        log.info("Mapping key: " + key);
 
        if (value.hasProperty("comment")) {
            String comment = (String) value.getProperty("comment");
            comment = comment.toLowerCase();
            value.setProperty("comment", comment);
            value.setProperty("updatedAt", new Date());
 
            DatastoreMutationPool mutationPool = this.getAppEngineContext(
                    context).getMutationPool();
            mutationPool.put(value);
        }
    }
}

Aha! So we finally put the context to use. Granted, we use the context as a parameter to another, more useful method, but at least we’re using it. We acquire a DatastoreMutationPool using the getAppEngineContext(context).getMutationPool() method, then we just call put() and pass the changed entity. DatastoreMutationPool is defined here and is open source.

The interface is similar to that of DatastoreService. There’s not a lot of fancy stuff going on here. put(), as we’ve seen, is defined. get() isn’t, because, well, that method makes no sense in this context. delete() is defined, which brings me to my bonus section:

Bonus Example 2: Delete all Entities of a given Kind

One of the most common questions asked in the group is, “How do I drop table?” Usually, this question is asked by new App Engine developers who don’t yet understand that the datastore is a distributed key-value store and not a relational database. But it’s also a legitimate use case. What if you just wanted to nuke all Entities of a given Kind? Prior to Mapper, you would have had to write your own handler to take care of this. Mapper makes this very easy. Here’s what a generic “DeleteAllMapper” would look like. This will work with *any* Entity Kind:

package com.ikai.mapperdemo.mappers;
 
import java.util.logging.Logger;
 
import org.apache.hadoop.io.NullWritable;
 
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.tools.mapreduce.AppEngineMapper;
import com.google.appengine.tools.mapreduce.DatastoreMutationPool;
 
/**
 *
 * This Mapper deletes all Entities of a given kind. It simulates the
 * DROP TABLE functionality asked for by developers.
 *
 * @author Ikai Lan
 *
 */
public class DeleteAllMapper extends
        AppEngineMapper {
    private static final Logger log = Logger.getLogger(DeleteAllMapper.class
            .getName());
 
    @Override
    public void map(Key key, Entity value, Context context) {
        log.info("Adding key to deletion pool: " + key);
        DatastoreMutationPool mutationPool = this.getAppEngineContext(context)
                .getMutationPool();
        mutationPool.delete(value.getKey());
    }
}

That’s it! We wire it up the same way we wire up other Mappers:


  
    mapreduce.map.class
 
    
    com.ikai.mapperdemo.mappers.DeleteAllMapper
  
 
  
  
    mapreduce.inputformat.class
    com.google.appengine.tools.mapreduce.DatastoreInputFormat
  
 
  
    mapreduce.mapper.inputformat.datastoreinputformat.entitykind
    Comment

Get the code

You’re undoubtedly ready to start playing with this thing. You’ve got everything you need to know. First, here’s the getting started page for appengine-mapreduce in Java:

Here’s my sample source code on GitHub.

Summary

So there you have it: an easy to use tool for mapping operations across entire Entity Kinds. There are still a lot of topics to cover, and we’ll likely explore them in a future article. For instance, I didn’t have a chance to cover building your own InputFormat class. We’re still hard at work extending this framework (such as the “Shuffle” and “Reduce” phases), so please post your feedback in the App Engine groups or file bugs in the issue tracker.

Using Asynchronous URLFetch on Java App Engine

Ikai Lan — Tue, 29 Jun 2010 20:06:00 +0000

Developers building applications on top of Java App Engine can use the familiar java.net interface for making off-network calls. For simple requests, this should be more than sufficient. The low-level API, however, provides one feature not available in java.net: asynchronous URLFetch.

The low-level URLFetch API

So what does the low-level API look like? Something like this:

import java.io.IOException;
import java.net.URL;
import java.util.List;
 
import com.google.appengine.api.urlfetch.HTTPHeader;
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        try {
            URL url = new URL("http://someurl.com");
            HTTPResponse response = fetcher.fetch(url);
 
            byte[] content = response.getContent();
                     // if redirects are followed, this returns the final URL we are redirected to
            URL finalUrl = response.getFinalUrl();
 
            // 200, 404, 500, etc
            int responseCode = response.getResponseCode();
            List headers = response.getHeaders();
 
            for(HTTPHeader header : headers) {
                String headerName = header.getName();
                String headerValue = header.getValue();
            }
 
        } catch (IOException e) {
            // new URL throws MalformedUrlException, which is impossible for us here
        }

The full Javadocs are here.

So it’s a bit different than the standard java.net interface, where we’d get back a reader and iterate line by line on the response. We’re protected from a heap overflow because URLFetch is limited to 1mb responses.

Asynchronous vs. Synchronous requests

Using java.net has the advantage of portability – you could build a standard fetcher that will work in any JVM environment, even those outside of App Engine. The tradeoff, of course, is that App Engine specific features won’t be present. The one killer feature of App Engine’s low-level API that isn’t present in java.net is asynchronous URLFetch. What is asynchronous fetch? Let’s make an analogy:

Let’s pretend you, like me at home, are on DSL and have a pretty pathetic downstream speed, and decide to check out a link sharing site like Digg. You browse to the front page and decide to open up every single link. You can do this synchronously or asynchronously.

Synchronously

Synchronously, you click link #1. Now you look at this page. When you are done looking at this page, you hit the back button and click link #2. Repeat until you have seen all the pages. Now, again, you are on DSL, so not only do you spend time reading each link, before you read each destination page, you have to wait for the page to load. This can take a significant portion of your time. The total amount of time you sit in front of your computer is thus:

N = number of links
L = loading time per page
R = reading time per page

N * (L + R)

(Yes, before I wrote this equation, I thought that by including some mathematical formulas in my blog post would make me look smarter, but as it turns out the equation is something comprehensible by 8 year olds internationally/American public high school students)

Asynchronously

Using a tabbed browser, you control-click every single link on the page to open them in new tabs. Now before you go look at any of the pages, you decide to go to the kitchen and make a grilled cheese sandwich. When you are done with the sandwich, you come back to your computer sit down, enjoying your nice, toasty sandwich while you read articles about Ron Paul and look at funny pictures of cats. How much time are you spending?

N = number of links
S = loading time for the slowest loading page
R = reading time per page
G = time to make a grilled cheese sandwich

MAX((N * R + G), (N * R + S))

Which takes longer, your DSL, or the time it takes you to make a grilled cheese sandwich? The point that I’m making here is that you can save time by parallelizing things. No, I know it’s not a perfect analogy, as downloading N pages in parallel consumes the same crappy DSL connection, but you get what I am trying to say. Hopefully. And maybe you are also in the mood for some cheese.

Asynchrous URLFetch in App Engine

So what would the URLFetch above look like asynchronously? Probably something like this:

import java.io.IOException;
import java.net.URL;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
 
import com.google.appengine.api.urlfetch.HTTPHeader;
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
    protected void makeAsyncRequest() {
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        try {
            URL url = new URL("http://someurl.com");
            Future future = fetcher.fetchAsync(url);
 
            // Other stuff can happen here!
 
            HTTPResponse response = future.get();
            byte[] content = response.getContent();
            URL finalUrl = response.getFinalUrl();
            int responseCode = response.getResponseCode();
            List headers = response.getHeaders();
 
            for(HTTPHeader header : headers) {
                String headerName = header.getName();
                String headerValue = header.getValue();
            }
 
        } catch (IOException e) {
            // new URL throws MalformedUrlException, which is impossible for us here
        } catch (InterruptedException e) {
            // Exception from using java.concurrent.Future
        } catch (ExecutionException e) {
            // Exception from using java.concurrent.Future
            e.printStackTrace();
        }
 
    }

This looks pretty similar – EXCEPT: fetchAsync doesn’t return an HTTPResponse. It returns a Future. What is a future?

java.concurrent.Future

From the Javadocs:

“A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can only be retrieved using method get when the computation has completed, blocking if necessary until it is ready. Cancellation is performed by thecancel method. Additional methods are provided to determine if the task completed normally or was cancelled.”

Future

What does this mean in English? It means that the Future object is NOT the response of the HTTP call. We can’t get the response until we call the get() method. Between when we call fetchAsync() and get, we can do other stuff: datastore operations, insert things into the Task Queue, heck, we can even do more URLFetch operations. When we finally DO call get(), one of two things happens:

We’ve already retrieved the URL. Return an HTTPResponse object
We’re still retrieving the URL. Block until we are done, then return an HTTPResponse object.

In the best case scenario, the amount of time it takes for us to do other things is equal to the amount of time it takes to do the URLFetch, and we save time. In the worst case scenario, the maximum amount of time is the time it takes to do the URLFetch or the other operations, whichever takes longer. The end result is that we lower the amount of time to return a response to the end-user.

Twitter Example

So let’s build a servlet that retrieves my tweets. Just for giggles, let’s do it 20 times and see what the performance difference is. We’ll make it so that if we pass a URL parameter, async=true (or anything, for simplicity), we do the same operation using fetchAsync. The code is below:

package com.ikai.urlfetchdemo;
 
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
 
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
 
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
@SuppressWarnings("serial")
public class GetTwitterFeedServlet extends HttpServlet {
 
    protected static String IKAI_TWITTER_RSS = "http://twitter.com/statuses/user_timeline/14437022.rss";
 
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
 
        boolean isSyncRequest = true;
 
        if(req.getParameter("async") != null) {
            isSyncRequest = false;
        }
 
        resp.setContentType("text/html");
        PrintWriter out = resp.getWriter();
        out.println("Twitter feed fetch demo");
 
        long startTime = System.currentTimeMillis();
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        URL url = new URL(IKAI_TWITTER_RSS);
 
        if(isSyncRequest) {
            out.println("Synchronous fetch");
            for(int i = 0; i < 20; i++) {
                HTTPResponse response = fetcher.fetch(url);
                printResponse(out, response);
            }
        } else {
            out.println("Asynchronous fetch");
            ArrayList> asyncResponses = new ArrayList>();
            for(int i = 0; i < 20; i++) {
                Future responseFuture = fetcher.fetchAsync(url);
                asyncResponses.add(responseFuture);
            }
 
            for(Future future : asyncResponses){
                HTTPResponse response;
                try {
                    response = future.get();
                    printResponse(out, response);
                } catch (InterruptedException e) {
                    // Guess you would do something here
                } catch (ExecutionException e) {
                    // Guess you would do something here
                }
            }
 
        }
 
        long totalProcessingTime = System.currentTimeMillis() - startTime;
        out.println("Total processing time: " + totalProcessingTime + "ms");
    }
 
    private void printResponse(PrintWriter out, HTTPResponse response) {
        out.println("");
        out.println("Response: " + new String(response.getContent()));
        out.println("");
    }
 
}

As you can see, it’s a bit more involved to store all the Futures in a list, then to iterate through them. We’re also not being too intelligent about iterating through the futures: we’re assuming first-in-first-out (FIFO) with URLFetch, which may or may not be the case in production. A more optimized case may try to fetch the response from a call we know is faster before fetching from one we know is slower first. However – empirical testing will show that more often than not, doing things asynchronously is significantly faster for the user than synchronously.

Using Asynchronous URLFetch and HTTP POST

So far, our examples have been focused on read operations. What if we don’t care about the response? For instance, what if we decide to make an API call that essentially is a “write” operation, and can, for the most part, safely assume it will succeed?

// JavaAsyncUrlFetchDemoServlet.java
package com.ikai.urlfetchdemo;
 
import java.io.IOException;
 
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
 
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
@SuppressWarnings("serial")
public class JavaAsyncUrlFetchDemoServlet extends HttpServlet {
 
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
 
        long startTime = System.currentTimeMillis();
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        fetcher.fetchAsync(FetchHelper.makeGuestbookPostRequest("Async", "At" + startTime));
        long totalProcessingTime = System.currentTimeMillis() - startTime;
 
        resp.setContentType("text/html");
        resp.getWriter().println("Asynchronous fetch demo");
        resp.getWriter().println("Total processing time: " + totalProcessingTime + "ms");
    }
 
}

// FetchHelper.java
package com.ikai.urlfetchdemo;
 
import java.net.MalformedURLException;
import java.net.URL;
 
import com.google.appengine.api.urlfetch.HTTPMethod;
import com.google.appengine.api.urlfetch.HTTPRequest;
 
public class FetchHelper {
 
    protected static final String signGuestBookUrl = //"http://bootcamp-demo.appspot.com/sign";
 
    public static HTTPRequest makeGuestbookPostRequest(String name, String content){
        HTTPRequest request = null;
        URL url;
        try {
            url = new URL(signGuestBookUrl);
            request = new HTTPRequest(url, HTTPMethod.POST);
            String body = "name=" + name + "&content=" + content;
            request.setPayload(body.getBytes());
 
        } catch (MalformedURLException e) {
            // Do nothing
        }
        return request;
    }
}

I’ve decided to spam my own guestbook here, for better or for worse.

Download the code

You can download the code from this post here using git: http://github.com/ikai/Java-App-Engine-Async-URLFetch-Demo

Using the bulkloader with Java App Engine

Ikai Lan — Thu, 10 Jun 2010 19:46:00 +0000

The latest release of the datastore bulkloader greatly simplifies import and export of data from App Engine applications for developers. We’ll go through a step by step example for using this tool with a Java application. Note that only setting up Remote API is Java specific – everything can be used with Python applications. Unlike certain phone companies, this is one store that doesn’t care what language your application is written in.

Checking for our Prerequisites:

If you already have Python 2.5.x and the Python SDK installed, skip this section.

First off, we’ll need to download the Python SDK. This example assumes we have Python version 2.5.x installed. If you’re not sure what version you have installed, open up a terminal and type “python”. This opens up a Python REPL, with the first line displaying the version of Python you’re using. Here’s example output:

Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

(Yes, Pythonistas, the version on my laptop is ooooooooold).

Download the Python SDK from the following link. As of the writing of this post, the newest version is 1.3.4: Direct link.

Unzip this file. It’ll be easier for you if you place this in your path. Linux and OS X users will append this in their ~/.bash_profile:

PATH="/path/to/where/you/unzipped/appengine:${PATH}"
export PATH

To test that everything is working, type:

appcfg.py

You’ll see a page of usage commands that starts out like this:

Usage: appcfg.py [options] 
 
Action must be one of:
create_bulkloader_config: Create a bulkloader.yaml from a running application.
cron_info: Display information about cron jobs.
download_data: Download entities from datastore.
help: Print help for a specific action.
request_logs: Write request logs in Apache common log format.
rollback: Rollback an in-progress update.
update: Create or update an app version.
update_cron: Update application cron definitions.
update_dos: Update application dos definitions.
update_indexes: Update application indexes.
update_queues: Update application task queue definitions.
upload_data: Upload data records to datastore.
vacuum_indexes: Delete unused indexes from application.
Use 'help ' for a detailed description.

…. (and so forth)

Now we can go ahead and start using the bulkloader.

Using pattern matching with regular expressions in Scala

Ikai Lan — Sat, 04 Apr 2009 19:54:00 +0000

I’ve been trying to use Scala more and more so I can gain some experience and exposure to it. A couple of weeks ago, I wrote a Scala log parser for Ruby on Rails. It is terribly newbie-ish – the classes are mutable and it’s disorganized. It’s a mess. Jorge Ortiz from the Scala mailing list was kind enough to rewrite it in a more Scala style. It completely blew my mind how terse Scala can become when written correctly.

It bothered me, however, dealing with regular expressions the way that I did. The Java interface is pretty clumsy and nowhere near as clean as regular expression pattern extraction in Perl or Ruby.

As it turns out, it’s surprisingly easy to extract text using Regular Expressions in Scala. Throw away Pattern.compile! Check out this hotness below:

First, let’s import Scala’s regex package:

import scala.util.matching.Regex

Now we declare a regular expression to match against. We can do this one of two ways:

val LogEntry = new Regex("""Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""")

I use triple quotes here to signify that I am creating a raw string. A raw string means that I do not need to escape characters like the \ character. If I didn’t do this, I’d be forced to use strings like “\\d+”. Believe it or not, that extra slash throws me off. Just goes to show that I have written way too many parsers.

Alternatively, I can declare a new Regex by doing this:

val LogEntry = """Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""".r

Strings have a method called “r”, which will convert it to a Regex object. I’m not sold on this syntax at the moment, since it doesn’t play well with eyeball scans, but I’m putting it here for those folks that absolutely need to save characters.

There’s nothing really special here yet. The next step is REALLY cool:

val line = "Completed in 100ms (View: 25, DB: 75) | 200 OK [http://app.domain.com?params=here]"

scala> val LogEntry(totalTime, viewTime, dbTime, responseCode, uri) = line totalTime: String = 100 viewTime: String = 25 dbTime: String = 75 responseCode: String = 200 uri: String =

The local variables totalTime, viewTime, dbTime, responseCode and uri are now bound to the values we want to extract from the original line! The regular expression value defines an unapplySeq method. I’m not quite good enough at Scala to tell you in any definite terms what that means, except that you can use the code in a pattern match:

line match { case LogEntry(totalTime, viewTime, dbTime, responseCode, uri) => { /* Process the data */ // do something with totalTime.toInt // do something with viewTime.toInt // etc ... } case _ => // Do nothing }

Because you can use a pattern match, and patterns will be be matched in the order of definition, this means that you can create several regular expressions representing lines you want to extract, then process them easily in using pattern matching.

Pretty powerful stuff. What would really make my day would be if someone knew how I could extract the values totalTime, viewTime, and dbTime as integers and not have to do a conversion – I’m already matching with \d+. Ideas?