Using Asynchronous URLFetch on Java App Engine

Developers building applications on top of Java App Engine can use the familiar java.net interface for making off-network calls. For simple requests, this should be more than sufficient. The low-level API, however, provides one feature not available in java.net: asynchronous URLFetch.

The low-level URLFetch API

So what does the low-level API look like? Something like this:

import java.io.IOException;
import java.net.URL;
import java.util.List;
 
import com.google.appengine.api.urlfetch.HTTPHeader;
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        try {
            URL url = new URL("http://someurl.com");
            HTTPResponse response = fetcher.fetch(url);
 
            byte[] content = response.getContent();
                     // if redirects are followed, this returns the final URL we are redirected to
            URL finalUrl = response.getFinalUrl();
 
            // 200, 404, 500, etc
            int responseCode = response.getResponseCode();
            List headers = response.getHeaders();
 
            for(HTTPHeader header : headers) {
                String headerName = header.getName();
                String headerValue = header.getValue();
            }
 
        } catch (IOException e) {
            // new URL throws MalformedUrlException, which is impossible for us here
        }

The full Javadocs are here.

So it’s a bit different than the standard java.net interface, where we’d get back a reader and iterate line by line on the response. We’re protected from a heap overflow because URLFetch is limited to 1mb responses.

Asynchronous vs. Synchronous requests

Using java.net has the advantage of portability – you could build a standard fetcher that will work in any JVM environment, even those outside of App Engine. The tradeoff, of course, is that App Engine specific features won’t be present. The one killer feature of App Engine’s low-level API that isn’t present in java.net is asynchronous URLFetch. What is asynchronous fetch? Let’s make an analogy:

Let’s pretend you, like me at home, are on DSL and have a pretty pathetic downstream speed, and decide to check out a link sharing site like Digg. You browse to the front page and decide to open up every single link. You can do this synchronously or asynchronously.

Synchronously

Synchronously, you click link #1. Now you look at this page. When you are done looking at this page, you hit the back button and click link #2. Repeat until you have seen all the pages. Now, again, you are on DSL, so not only do you spend time reading each link, before you read each destination page, you have to wait for the page to load. This can take a significant portion of your time. The total amount of time you sit in front of your computer is thus:

N = number of links
L = loading time per page
R = reading time per page

N * (L + R)

(Yes, before I wrote this equation, I thought that by including some mathematical formulas in my blog post would make me look smarter, but as it turns out the equation is something comprehensible by 8 year olds internationally/American public high school students)

Asynchronously

Using a tabbed browser, you control-click every single link on the page to open them in new tabs. Now before you go look at any of the pages, you decide to go to the kitchen and make a grilled cheese sandwich. When you are done with the sandwich, you come back to your computer sit down, enjoying your nice, toasty sandwich while you read articles about Ron Paul and look at funny pictures of cats. How much time are you spending?

N = number of links
S = loading time for the slowest loading page
R = reading time per page
G = time to make a grilled cheese sandwich

MAX((N * R + G), (N * R + S))

Which takes longer, your DSL, or the time it takes you to make a grilled cheese sandwich? The point that I’m making here is that you can save time by parallelizing things. No, I know it’s not a perfect analogy, as downloading N pages in parallel consumes the same crappy DSL connection, but you get what I am trying to say. Hopefully. And maybe you are also in the mood for some cheese.

Asynchrous URLFetch in App Engine

So what would the URLFetch above look like asynchronously? Probably something like this:

import java.io.IOException;
import java.net.URL;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
 
import com.google.appengine.api.urlfetch.HTTPHeader;
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
    protected void makeAsyncRequest() {
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        try {
            URL url = new URL("http://someurl.com");
            Future future = fetcher.fetchAsync(url);
 
            // Other stuff can happen here!
 
            HTTPResponse response = future.get();
            byte[] content = response.getContent();
            URL finalUrl = response.getFinalUrl();
            int responseCode = response.getResponseCode();
            List headers = response.getHeaders();
 
            for(HTTPHeader header : headers) {
                String headerName = header.getName();
                String headerValue = header.getValue();
            }
 
        } catch (IOException e) {
            // new URL throws MalformedUrlException, which is impossible for us here
        } catch (InterruptedException e) {
            // Exception from using java.concurrent.Future
        } catch (ExecutionException e) {
            // Exception from using java.concurrent.Future
            e.printStackTrace();
        }
 
    }

This looks pretty similar – EXCEPT: fetchAsync doesn’t return an HTTPResponse. It returns a Future. What is a future?

java.concurrent.Future

From the Javadocs:

“A Future represents the result of an asynchronous computation. Methods are provided to check if the computation is complete, to wait for its completion, and to retrieve the result of the computation. The result can only be retrieved using method get when the computation has completed, blocking if necessary until it is ready. Cancellation is performed by thecancel method. Additional methods are provided to determine if the task completed normally or was cancelled.”

Future

What does this mean in English? It means that the Future object is NOT the response of the HTTP call. We can’t get the response until we call the get() method. Between when we call fetchAsync() and get, we can do other stuff: datastore operations, insert things into the Task Queue, heck, we can even do more URLFetch operations. When we finally DO call get(), one of two things happens:

We’ve already retrieved the URL. Return an HTTPResponse object
We’re still retrieving the URL. Block until we are done, then return an HTTPResponse object.

In the best case scenario, the amount of time it takes for us to do other things is equal to the amount of time it takes to do the URLFetch, and we save time. In the worst case scenario, the maximum amount of time is the time it takes to do the URLFetch or the other operations, whichever takes longer. The end result is that we lower the amount of time to return a response to the end-user.

Twitter Example

So let’s build a servlet that retrieves my tweets. Just for giggles, let’s do it 20 times and see what the performance difference is. We’ll make it so that if we pass a URL parameter, async=true (or anything, for simplicity), we do the same operation using fetchAsync. The code is below:

package com.ikai.urlfetchdemo;
 
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
 
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
 
import com.google.appengine.api.urlfetch.HTTPResponse;
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
@SuppressWarnings("serial")
public class GetTwitterFeedServlet extends HttpServlet {
 
    protected static String IKAI_TWITTER_RSS = "http://twitter.com/statuses/user_timeline/14437022.rss";
 
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
 
        boolean isSyncRequest = true;
 
        if(req.getParameter("async") != null) {
            isSyncRequest = false;
        }
 
        resp.setContentType("text/html");
        PrintWriter out = resp.getWriter();
        out.println("<h1>Twitter feed fetch demo</h1>");
 
        long startTime = System.currentTimeMillis();
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        URL url = new URL(IKAI_TWITTER_RSS);
 
        if(isSyncRequest) {
            out.println("<h2>Synchronous fetch</h2>");
            for(int i = 0; i < 20; i++) {
                HTTPResponse response = fetcher.fetch(url);
                printResponse(out, response);
            }
        } else {
            out.println("<h2>Asynchronous fetch</h2>");
            ArrayList<Future<HTTPResponse>> asyncResponses = new ArrayList<Future<HTTPResponse>>();
            for(int i = 0; i < 20; i++) {
                Future<HTTPResponse> responseFuture = fetcher.fetchAsync(url);
                asyncResponses.add(responseFuture);
            }
 
            for(Future<HTTPResponse> future : asyncResponses){
                HTTPResponse response;
                try {
                    response = future.get();
                    printResponse(out, response);
                } catch (InterruptedException e) {
                    // Guess you would do something here
                } catch (ExecutionException e) {
                    // Guess you would do something here
                }
            }
 
        }
 
        long totalProcessingTime = System.currentTimeMillis() - startTime;
        out.println("<p>Total processing time: " + totalProcessingTime + "ms</p>");
    }
 
    private void printResponse(PrintWriter out, HTTPResponse response) {
        out.println("<p>");
        out.println("Response: " + new String(response.getContent()));
        out.println("</p>");
    }
 
}

As you can see, it’s a bit more involved to store all the Futures in a list, then to iterate through them. We’re also not being too intelligent about iterating through the futures: we’re assuming first-in-first-out (FIFO) with URLFetch, which may or may not be the case in production. A more optimized case may try to fetch the response from a call we know is faster before fetching from one we know is slower first. However – empirical testing will show that more often than not, doing things asynchronously is significantly faster for the user than synchronously.

Using Asynchronous URLFetch and HTTP POST

So far, our examples have been focused on read operations. What if we don’t care about the response? For instance, what if we decide to make an API call that essentially is a “write” operation, and can, for the most part, safely assume it will succeed?

// JavaAsyncUrlFetchDemoServlet.java
package com.ikai.urlfetchdemo;
 
import java.io.IOException;
 
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
 
import com.google.appengine.api.urlfetch.URLFetchService;
import com.google.appengine.api.urlfetch.URLFetchServiceFactory;
 
@SuppressWarnings("serial")
public class JavaAsyncUrlFetchDemoServlet extends HttpServlet {
 
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
 
        long startTime = System.currentTimeMillis();
        URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
        fetcher.fetchAsync(FetchHelper.makeGuestbookPostRequest("Async", "At" + startTime));
        long totalProcessingTime = System.currentTimeMillis() - startTime;
 
        resp.setContentType("text/html");
        resp.getWriter().println("<h1>Asynchronous fetch demo</h1>");
        resp.getWriter().println("<p>Total processing time: " + totalProcessingTime + "ms</p>");
    }
 
}

// FetchHelper.java
package com.ikai.urlfetchdemo;
 
import java.net.MalformedURLException;
import java.net.URL;
 
import com.google.appengine.api.urlfetch.HTTPMethod;
import com.google.appengine.api.urlfetch.HTTPRequest;
 
public class FetchHelper {
 
    protected static final String signGuestBookUrl = //"http://bootcamp-demo.appspot.com/sign";
 
    public static HTTPRequest makeGuestbookPostRequest(String name, String content){
        HTTPRequest request = null;
        URL url;
        try {
            url = new URL(signGuestBookUrl);
            request = new HTTPRequest(url, HTTPMethod.POST);
            String body = "name=" + name + "&amp;content=" + content;
            request.setPayload(body.getBytes());
 
        } catch (MalformedURLException e) {
            // Do nothing
        }
        return request;
    }
}

I’ve decided to spam my own guestbook here, for better or for worse.

Download the code

You can download the code from this post here using git: http://github.com/ikai/Java-App-Engine-Async-URLFetch-Demo