Ikai Lan says

I say things!

Clearing up some things about LinkedIn mobile’s move from Rails to node.js

There’s an article on highscalability that’s talking about the move from Rails to node.js (for completeness: its sister discussion on Hacker News). It’s not the first time this information has been posted. I’ve kind of ignored it for now (because I didn’t want to be this guy), but it’s come up enough times and no one has spoken up, so I suppose it’s up to me to clear a few things up.

I was on the team at LinkedIn that was responsible for the mobile server, and while I wasn’t the primary contributor to that stack, I built and contributed several things, such as the unfortunate LinkedIn WebOS app which made use of the mobile server (and a few features) and much of the initial research behind productionizing JRuby for web applications (I did much more stuff that wasn’t published). I left LinkedIn in 2009, so I apologize if any new information has surfaced. My hunch is that even if I’m off, I’m not off by that much.

Basically: the article is leaving out several facts. We can all learn something from the mobile server and software engineering if we know the full story behind the whole thing.

In 2008, I joined a software engineering team that LinkedIn that was focused on building things outside the standard Java stack. You see, back then, to develop code for linkedin.com, you needed a Mac Pro with 6gigs of RAM just to run your code. And those requirements kept growing. If my calculations are correct, the standard setup for engineers now is a machine with 20 or more gigabytes of RAM just to RUN the software. In addition, each team could only release once every 6 weeks (this has been fixed in the last few years). It was deemed that we needed to build out a platform off the then-fledgling API and start creating teams to get get off the 6 week release cycle so we could iterate quickly on new features. The team I was on, LED, was created for this purpose.

Our first projects was a rotating globe that showed off new members joining LinkedIn. It used to run Poly9, but when they got shut down, it looks like someone migrated it to use Google Earth. The second major project was m.linkedin.com, a mobile web client for LinkedIn that would be one of the major clients of our fledgling API server, codenamed PAL. Given that we were building out an API for third parties, we figured that we could eat our own dogfood and build out LinkedIn for mobile phones with browsers. This is 2008, mind you. The iPhone just came out, and it was a very Blackberry world.

The stack we chose was Ruby on Rails 1.2, and the deployment technology was Mongrel. Remember, this is 2008. Mongrel was cutting edge Ruby technology. Phusion Passenger wasn’t released yet (more on this later), and Mongrel was light-years ahead of FastCGI. The problem with Mongrel? It’s single-threaded. It was deemed that the cost of shipping fast was more important than CPU efficiency, a choice I agreed with. We were one of the first products at LinkedIn to do i18n (well, we only did translations) via gettext. We deployed using Capistrano, and were the first ones to use nginx. We did a lot of other cool stuff, like experiment with Redis, learn a lot about memcached in production (nowadays this is a given, but there was a lot of memcached vs EHCache talk back then). Etc, etc. But I’m not trying to talk about how much fun I had on that team. Well, not primarily.

I’m here to clear up facts about the post about moving to node.js. And to do that, I’m going to back to my story.

The iPhone SDK had shipped around that time. We didn’t have an app ready for launch, but we wanted to build one, so our team did, and we inadvertantly became the mobile team. So suddenly, we decided that this array of Rails server that made API calls to PAL (which was, back then, using a pre-OAuth token exchange technology that was strikingly similar) would also be the primary API server for the iPhone client and any other rich mobile client we’d end up building, this thing that was basically using Rails rhtml templates. We upgraded to Rails 2.x+ so we could have the respond_to directive for different outputs. Why didn’t we connect the iPhone client directly to PAL? I don’t remember. Oh, and we also decided to use OAuth for authenticating the iPhone client. Three legged OAuth, so we also turned those Rails servers into OAuth providers. Why did we use 3-legged OAuth? Simple: we had no idea what we were doing. I’LL ADMIT IT.

Did I mention that we hosted outside the main data centers? This is what Joyent talks about when they say they supplied LinkedIn with hosting. They never hosted linkedin.com proper on Joyent, but we had a long provisioning process for getting servers in the primary data center, and there were these insane rules about no scripting languages in production, so we decided it was easier to adopt an outside provider when we needed more capacity.

Here’s what you were accessing if you were using the LinkedIn iPhone client:

iPhone -> m.linkedin.com (running on Rails) -> LinkedIn’s API (which, for all intents and purposes, only had one client, us)

That’s a cross data center request, guys. Running on single-threaded Rails servers (every request blocked the entire process), running Mongrel, leaking memory like a sieve (this was mostly the fault of gettext). The Rails server did some stuff, like translations, and transformation of XML to JSON, and we tested out some new mobile-only features on it, but beyond that it didn’t do a lot. It was a little more than a proxy. A proxy with a maximum concurrency factor dependent on how many single-threaded Mongrel servers we were running. The Mongrel(s), we affectionately referred to them, often bloated up to 300mb of RAM each, so we couldn’t run many of them.

At this time, I was busy productionizing JRuby. JRuby, you see was taking full advantage of Rails’ ability to serve concurrent requests using JVM concurrency. In addition, JRuby outperformed MRI in almost every real benchmark I threw at it – there were maybe 1 or 2 specific benchmarks when it didn’t. I knew that if we ported the mobile server to JRuby, we could have gotten more performance and gotten way more concurrency. We would have kept the same ability to deploy fast with the option to in-line into many of the Java libraries LinkedIn was using.

But we didn’t. Instead, the engineering manager at the time ruled in favor of Phusion Passenger, which, to be fair, was an easier port than JRuby. We had come to depend on various native extensions, gettext being the key one, and we didn’t have time to port the translations to something that was JRuby friendly. I was furious, of course, because I had been evangelizing JRuby as the best Ruby production environment and no one was listening, but that’s a different story for a different time. Well, maybe some people listened; those Square guys come to mind.

This was about the time I left LinkedIn. As far as I know, they didn’t build a ton more features. Someone told me that one of my old teammates suddenly became fascinated with node.js, and pretty much singlehandedly decided to rewrite the mobile server using node. Node was definitely a better fit for what we were doing, since we were constantly blocking on a cross data center call, and non blocking server for IO has been shown to be highly advantageous from a performance perspective. Not to mention: we never intended for the original Ruby on Rails server to be used as a proxy for several years.

So, knowing all the facts, what are all the takeaways?

  • Is v8 faster than MRI? MRI is generally slower than YARV (Ruby 1.9), and, at least in these benchmarks, I don’t think there is any question that v8 is freakin’ fast. If node.js blocked on I/O, however, this fact would have been almost completely irrelevant.
  • The rewrite factor. How many of us have been on a software engineering project where the end result looking nothing like what we planned to build in the first place? And, knowing fully the requirements, we know that, if given time and the opportunity to rebuild it from scratch, it would have been way better? Not to mention: I grew a lot at LinkedIn as a software engineer, so the same me several years later would have done a far better job than the same me in 2008. Experience does matter.
  • I see that one of the advantages of the mobile server being in node.js is people could “leverage” (LinkedIn loves that word) their Javascript skills. Well, LinkedIn had/has hundreds of Java engineers! If that was a concern, we would have spent more time exploring Netty. Lies, damn lies, and benchmarks, I always say, but I think it’s safe for us to say that Netty (this is vertx, which sits on top of Netty) is at least as fast as node.js for web serving.
  • Firefighting? That was probably a combination of several things: the fact that we were running MRI and leaked memory, or the fact that the ops team was 30% of a single guy.

What I’m saying here is use your brain. Don’t read the High Scalability post and assume that you must build your next technology using node.js. It was definitely a better fit than Ruby on Rails for what the mobile server ended up doing, but it is not a performance panacea. You’re comparing a lower level server to a full stack web framework.

That’s all for tonight, folks, and thank you internet for finally goading me out of hiding again.

- Ikai

About these ads

Written by Ikai Lan

October 4, 2012 at 6:34 pm

13 Responses

Subscribe to comments with RSS.

  1. It seemed awfully hasty to throw away the codebase for a newfangled node.js server. I personally have experience with Java platforms (Grails, JRuby, Netty, and all manner of backend tech). I suppose if the goal was to merge the backend and mobile engineering teams it was a great choice but unlike what the HighScalability post might claim it really doesn’t seem like they were out of options for scaling Rails.

    It’s good to know someone had sense over there.

    Michael Rose (@Xorlev)

    October 4, 2012 at 9:19 pm

  2. I wish you had written about this a year (or two) back and I came across your article then. I had experienced similar problem while using Mongrel(single thread), and posted about this on Stackoverflow (http://stackoverflow.com/questions/6278817/is-sinatra-multi-threaded). In fact I gave a shot at thin, which has experimental multi-thread support. I found thin unstable then and we did not go with it. Fortunately we migrated to Jruby and used Mizuno (wrapper on Jetty) to achieve concurrency without switching technology stack, or, throwing away few months of coding efforts. I am glad that you have detailed about this for better understanding of others. It otherwise has been a difficult job for me to explain the scenario to others.

    Chandan Kumar

    October 4, 2012 at 10:15 pm

  3. Gosh! When I read that article I was gone nuts, there are so many thing that will leave a wrong impression from the article. I once did a blog post on benchmarking NodeJS with basic Java stuff (not even a sophisticated framework) and I was harassed by NodeJS police http://blog.creapptives.com/post/9677133069/node-on-nails . What I saw in the complete LinkedIn (mobile) case was something interesting that can work good on NodeJS. But I wonder if people give attention to details.

  4. Thanks for this extremely informative article… the highscalability article’s headline takeaways seemed way too strong to be the full story

    johnconroy (@johnconroy)

    October 5, 2012 at 12:10 am

  5. Really great article. Your frankness on how IT decisions get made is refreshing for those of us who’ve been living that, and wondering why everyone else seems to think they’re all 100% rational. :D



    October 5, 2012 at 5:45 am

  6. Appreciate your post, based on my experience with a heavily utilized Rails 1.2.3 app, my findings are quite similar (the one I’m looking to port launched in 2007 and averages > 1m req/month).

    Russ Rollins

    October 5, 2012 at 7:27 am

  7. Awesome post! I work with both Rails and Node daily, and I use them for what they’re good for. I’m curious: I hate Java, but could I see performance gains from running my Ruby programs in JRuby instead of YARV?


    October 5, 2012 at 10:42 am

  8. [...] Node.js some time ago for performance and scalability reasons. A former LinkedIn team member reacted explaining what went wrong, in his [...]

  9. > MRI is generally slower than YARV (Ruby 1.9),

    Just so you know, Ruby 1.9 is officially also MRI, it was only called YARV during development.

  10. Thank you for posting this. I had a feeling that there was more to the story than the node evangelists would have us believe. The rewrite factor being the most obvious. Any time you start a project with all of the requirements in mind, the result is going to be much better than when it grows organically.

    James Dunn

    October 7, 2012 at 5:08 am

  11. Cool story bro!

    I guess this situation with scalability, leads to the point where you have to ask, what is more important knowing my currently deployed stack and understand it or install a random solution and pick 3 metrics which the newer stack does better, and write an article about it. If the second one is cheaper (less engineering effort) do it; no question, this is a strategical decision. However, this does not tell anything in terms of how good that platform is, or how good the new platform at all. I think using better solutions with Rails could have been less effort to fix your performance and scalability (without having numbers I can’t be sure).

    The other part is the “what language is better” argument, I guess the scientific comparison of these languages (Ruby and JS) could not find major differences, it just feels wrong to me to write javascript for a backend system when I know that I can express in Ruby the problem way better. I guess there are couple of other languages before I would even think about JS, but again, it is just personal preference. If there was a must situation I would probably use Coffee script, which more concise than JS.

    Thanks for this writeup!

    Istvan (@lix)

    October 7, 2012 at 7:47 pm

  12. [...] More background by Ikai Lan, who worked on the mobile server team at LinkedIn, says some facts were left out: the app made “a cross data center request, guys. Running on single-threaded Rails servers [...]

  13. Man, that was a lot of fun … and a lot of learning. Nice post, Ikai.

    And we didn’t connect the iPhone app to PAL because “the mobile platform” was a better aggregation of data from multiple calls into a more flexible collection of objects which better served our purposes then. Some excellent work was done to make that happen by pretty much everyone in the group.

    I still remember how fast we could provision and expand at Joyent. Made life much easier in our, er, less than resource efficient stack. We always knew there would come a time when it was more important to consolidate and drive to efficiency; at the time, we were optimized for shipping features and learning.

    And regarding JRuby, the folks at http://eng.wealthfront.com/ got it, too. They’re a smart bunch. I’m glad to be with them, though I’ll always miss LED.

    Jim Meyer

    October 9, 2012 at 1:49 am

Comments are closed.