Ikai Lan says

I say things!

Posts Tagged ‘Scala

Using pattern matching with regular expressions in Scala

with 11 comments

I’ve been trying to use Scala more and more so I can gain some experience and exposure to it. A couple of weeks ago, I wrote a Scala log parser for Ruby on Rails. It is terribly newbie-ish – the classes are mutable and it’s disorganized. It’s a mess. Jorge Ortiz from the Scala mailing list was kind enough to rewrite it in a more Scala style. It completely blew my mind how terse Scala can become when written correctly.

It bothered me, however, dealing with regular expressions the way that I did. The Java interface is pretty clumsy and nowhere near as clean as regular expression pattern extraction in Perl or Ruby.

As it turns out, it’s surprisingly easy to extract text using Regular Expressions in Scala. Throw away Pattern.compile! Check out this hotness below:

First, let’s import Scala’s regex package:

import scala.util.matching.Regex

Now we declare a regular expression to match against. We can do this one of two ways:

val LogEntry = new Regex("""Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""")

I use triple quotes here to signify that I am creating a raw string. A raw string means that I do not need to escape characters like the \ character. If I didn’t do this, I’d be forced to use strings like “\\d+”. Believe it or not, that extra slash throws me off. Just goes to show that I have written way too many parsers.

Alternatively, I can declare a new Regex by doing this:

val LogEntry = """Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""".r

Strings have a method called “r”, which will convert it to a Regex object. I’m not sold on this syntax at the moment, since it doesn’t play well with eyeball scans, but I’m putting it here for those folks that absolutely need to save characters.

There’s nothing really special here yet. The next step is REALLY cool:

val line = "Completed in 100ms (View: 25, DB: 75) | 200 OK [http://app.domain.com?params=here]"

scala> val LogEntry(totalTime, viewTime, dbTime, responseCode, uri) = line
totalTime: String = 100
viewTime: String = 25
dbTime: String = 75
responseCode: String = 200
uri: String =

The local variables totalTime, viewTime, dbTime, responseCode and uri are now bound to the values we want to extract from the original line! The regular expression value defines an unapplySeq method. I’m not quite good enough at Scala to tell you in any definite terms what that means, except that you can use the code in a pattern match:

line match {
case LogEntry(totalTime, viewTime, dbTime, responseCode, uri) => {
/* Process the data */
// do something with totalTime.toInt
// do something with viewTime.toInt
// etc ...
case _ => // Do nothing

Because you can use a pattern match, and patterns will be be matched in the order of definition, this means that you can create several regular expressions representing lines you want to extract, then process them easily in using pattern matching.

Pretty powerful stuff. What would really make my day would be if someone knew how I could extract the values totalTime, viewTime, and dbTime as integers and not have to do a conversion – I’m already matching with \d+. Ideas?

Written by Ikai Lan

April 4, 2009 at 5:41 pm

Posted in Scala

Tagged with , , ,

Twitter, Ruby on Rails, Scala and people who don’t RTFA

with 34 comments

The Register recently published an article titled, “Twitter jilts Ruby for Scala“, prompting a wave of Tweets (Twitter messages for the jargon challenged) about Ruby on Rails and scaling. More specifically, its lack of ability to do do. The source of this article was a talk given by the API lead at Twitter, Alex Payne, at the Web 2.0 Expo in San Francisco, where he discussed porting parts of Twitter from Ruby to Scala. Naturally, the armchair commentators added their two cents about the message, taking it to mean that Scala is *the* one true language for internet companies, and that Ruby on Rails should not be used because it cannot scale.

I tweeted about my annoyance with retweeters that don’t understand what they are retweeting, and Alex responded: “It’s really frustrating for me. I don’t think the people who were there for my talk got that impression. The press skewed it.”

Alex wasn’t saying that Rails can’t scale. Alex was saying that he really likes Scala, and parts of Twitter that were written in Ruby were rewritten in Scala because some tasks are simply not appropriate for Ruby. I know of Kestrel, their messaging queue, but likely there are other middleware layers that sit between their APIs/web interfaces and their data sources that are being ported.

Now, Scala is an amazing language. It compiles to JVM bytecode, and thus, most procedures will run near or at the speed of Java. Functions can be treated as first class objects and passed around like variables. Its system of type inference means less ceremony for simply declaring variables. Traits function remarkably like mixins in dynamic languages, allowing very rich composite classes that, in many cases, need to add very little boilerplate for the composition, unlike Java interfaces. And pattern matching with wildcards? I can’t even think of an analogy for how powerful of a tool this is. Is it the best compiled language? Some think so, and I can certainly see where they are coming from. I’m learning Scala as we speak. The basics were pretty easy to pick up, but I’m still struggling to be good with it. Scala best practices such as programming in an immutable style are far easier to preach than to actual do.

But Scala is not Ruby, and will never be Ruby. Ruby is probably the most powerful programming language on the planet for creating DSLs, or domain specific languages. It can do this because any class or object can be extended, any method overridden, any constant undefined and redefined, and so on, and so forth. You can take an Integer and declare more methods on it. Duck typing means that you don’t have to set up boilerplate interfaces and abstract classes to create objects that can replace other objects with similar interfaces. A web development platform as powerful and syntactically concise as Rails could only have been done in Ruby. Imagine this: a developer looks at a model file representing a database table, and immediately understands the business rules regarding validity as well as the callbacks that will take place when the object is created, updated or destroyed. Or this: it is possible to create a plugin that extends collections objects to allow very complex pagination, and a client developer only has to include a file or remove the include statement. This plugin recognizes if the collection in question maps to a database, and if so, applies LIMIT and OFFSET statements to SQL queries as to not eagerly load too many objects at once. This plugin exists.

This power does not come without a price. Ruby is slow. Ruby is generally recognized as being twice as slow as Python, and at least an order of magnitude slower than C++ or Java. Charles Nutter, one of the creators of the JRuby VM for the Ruby language, recently posted an article about optimizations that could be made to the VM, and he improves Ruby performance by as much as twenty times by removing much of Ruby’s power and breaking compatibility. If anything, it shows us skeptics that there is no such thing as a free lunch.

So if Ruby is slow, and Rails uses Ruby, does that mean Rails cannot scale? Let me make this clear right now: Though Ruby is slow, this does not mean that Rails does not scale. Scalability is a very difficult concept, often oversimplified to be synonymous with efficiency. Efficiency is a facet of scalability, yes, but it is not its only dimension. Reliability and uptime are very important dimensions of a scalable system. So is horizontal scalability: the ability to serve an increase in usage with a corresponding increase in computational nodes. Though it may seem like a given that throwing hardware at a problem will increase capacity, this simply isn’t so. Anybody that has worked on a large data driven application will tell you that a centralized, authoritative source such as a database cluster is a bottleneck, and simply throwing hardware at the problem results in diminishing marginal returns. Then there is quality of service. A truly scalable architecture will have minimal service degradation as users and usage exponentially increase.

I use the word “architecture” because a truly scalable system that meets the requirements set by its intended consumer is rarely, if ever, about a single component. Even if we were just talking about speed and none of the other dimensions of scalability, there’s a great article by Joe Stump where he calls out language critics about this exact subject. C++ can probably assemble HTML at least a hundred times faster than Ruby, but this has minimal impact on what a user sees because disk reads are slow, talking to a data source is slow, and sending data over the internet is *really* slow. On top of all that, Steve Souders, creator of YSlow, a popular tool for benchmarking perceived speed of web sites, argues that for most sites, 90% of the wait time a user has to put up with is a result of loading assets, JavaScript and stylesheets. Rails and proper database constraints can ensure data integrity (reliability). Rails is by default stateless and sessions can be centralized (fault tolerance and horizontal scalability), check. Rails can scale, and does scale. Twitter is not replacing their web tier because it is fast enough, and they have focused on optimizing their middle tier, which sits between their web layer and their data layer. There’s no point in replacing this layer. Its performance and reliability are completely bound by the data and middleware layers.

This is where Scala enters the picture: middleware. Likely this refers to a non-RDBMS datasource serving denormalized social graph, a dispatching component for pushing messages through SMS and email, a queue and queue workers. There are a few requirements here: concurrency and computational speed, both of which are areas where Ruby falls flat because MRI Ruby, the stock VM, uses green threads which block on I/O and do not make use of multicore processors. In addition, Ruby’s memory requirements are aggressive. Scala, on the other hand, is fast and can is as concurrent as the JVM will allow, which is pretty damned concurrent.

So what Twitter did was optimize the bottleneck and leave their investment in the front-end intact. This shouldn’t surprise anybody. Facebook started out on PHP, but now their backend is an amalgam of C/C++, Erlang and other languages. Google runs Java and C++ for their backend, but I’m told several web tier services are written in Python. And so forth. This doesn’t mean don’t use Python, and don’t use PHP. It just means to be ready to optimize and possibly to replace those components when the time is right, which, for many startups, is a LOOOOOOONG way off.

In fact, I still push Ruby on Rails as the development platform of choice for web startups launching a 1.0. Why? There are many reasons. Here are a few:

  • You will hit the ground running fast, and you will have something working within a single development cycle. If your application has any kind of boilerplate, like user management, pagination, multiformat output, simple AJAX or CRUD functionality, Rails just saved you from having to write most of it.
  • Onboarding developers is fast. I’ve been on projects where new developers have been able to be productive the first or second day of looking at a project’s structure and tests
  • Rails emphasizes the importance of test driving development, and any good Rails developer will feel wrong writing code without corresponding tests – when I started Scala, I settled on the NetBeans IDE because it seemed like the easiest to get going with JUnit
  • Solving scalability problems too early is a bad idea. Nobody starts out with a sharded database. Instead of building features or attracting customers to a usable product, you spend your time building an incredibly complex system that will scale but at severe opportunity cost if it hits the market late. You want scalability problems, because it means you are growing too fast. In the words of The Wire’s Marlo Stanfield, “that sound like one of them good problems.”

– Ikai

Written by Ikai Lan

April 2, 2009 at 9:58 pm

Posted in Ruby on Rails, Scalability

Tagged with , ,

First impressions of Lift Scala web development framework from a Ruby on Rails developer

with 8 comments

Over the past few months I’ve been hearing a lot about Scala and, in general, very interested. Scala, short for “scalable language” (and why I will continue to pronounce it “skay-lah” rather than “skah-lah”) is a strongly typed JVM language that combines aspects of functional programming and dynamic languages with static typing and the JVM to provide a language that has some of the flexibility of languages such as Ruby or Haskell, but with the performance and interoperability of Java.

One of the projects leading the charge in Scala is Lift, a web development framework. I’ve been developing in Ruby on Rails for the past few years, and really, I don’t need to learn another framework. I took a very close examination at Django and even built a few projects, but stuck with Rails as my primary tool of choice. Lift interests me for the following reasons:

  • can run in any Java Application Server
  • can run inline Java

Why are these important? I’ve met a lot of consultants who will recommend a solution for a client that requires a deployment mod_php/Erlang/WSGI/Mongrel and got the project shot down. But switch the pitch to Java running in an application server? Happy client. I’ve been pushing JRuby hard on every other Rails developer I meet, so these requirements are not high on my list, though they add a lot of points. Also – it should be worth mentioning that many, many Java based frameworks such as Grails or the recently open sourced AribaWeb can do these.

  • high performance* (have not seen with my own eyes)
  • uses Scala
  • out of the box Comet support

Comet is a way of simulating push applications over HTTP using a browser, which is completely a client pull type of application. Comet is also known as long polling, and it is how every single JavaScript browser chat application works (Meebo, Facebook chat, etc). Basically, the client JavaScript opens an XMLHttpRequest (XHR for short) to the server which the server does not respond to until there is data to be pushed (hence the name “long polling”). This gets rid of the need for clients to poll the server at intervals which has two problems: at longer intervals, data is not pushed as quickly, which would make an IM application unusable. At shorter intervals, browsers would quickly saturate their network connection as well as the server. Comet isn’t without its problems. Most Java applications, for instance, use a “one thread per request” model for each open connection, which does not scale efficiently as the threads would be the limiting factor in the number of clients that could connect at once even though most of the time the threads would be idle. In fact, open connections was such a problem that Facebook’s chat implementation is completely written in Erlang, a concurrent language that can create millions of lightweight processes and was really the only implementation that could scale efficiently to their needs. They blog about it here: http://www.facebook.com/note.php?note_id=14218138919

The way Lift deals with Comet scaling issues is by making use of Jetty Continuations. The thread-per-request model is still used, however, threads are suspended when they are not needed, resulting in a much more efficient use of resources.

It’s these reasons that make Lift appealing to me to learn. However, after fiddling with Lift, there are a few things off the top of my head that I already don’t like much about Lift or are so different they threaten to make my brain reboot:

  • Servers are not stateless. If you need to horizontally scale, your load balancer needs to read the JSESSIONID parameter in the HTTP request and direct traffic based on that information. I’ve been told that Lift is so incredibly high performance this isn’t necessary. This doesn’t answer the question of hot failover, and frankly, I was a bit disappointed.
  • Everything depends on state! This is probably WHY Lift can be so high performance. Most web frameworks deal with requests as they come in, looking up the same data per request to reinstantiate session objects, User objects, or any other objects that need to persist longer than a web request. It’s a completely different way of thinking about problems and web development that experienced web developers coming into Lift from other frameworks will have to come in with a blank slate. Lift encourages abstracting away the request/response cycle. It remains to be seen whether this is a good thing or a bad thing.
  • Unintuitive way to add new pages. You have to add to a sitemap in what might be the most unintuitive manner possible:
    val entries = Menu(Loc("Home", List("index"), "Home")) :: Menu(Loc("Test", List("test"), "Test")) :: User.sitemap

    The :: is the operator for list concatenation. This will make all your pages appear in the site menu. To make pages that don’t appear in the site menu?

    Menu(Loc("MyHiddenPage", List("hidden"), "hidden", Hidden))

    To me this seems unintuitive, but then again, I don’t actually understand what is happening here, as the API docs are unusable. I haven’t figured out routing yet, for instance. When I was learning Django, I figured out how to set up routes to functions in minutes.

  • Scala shorthand. I’ve mentioned this and all the functional programmers have screamed bloody murder. Code is for human beings, not cyborgs. I understand that it’s clever to save keystrokes:
    list.sort(_ < _)

    As opposed to, say, Ruby:

    list.sort { |a, b| a < b }

    But these are trivial examples. There’s code all over the place that looks like this:

    fun1(a, b _).call(_).fun2(_ <= _)

    I’m sorry, but that’s not very welcoming. The worst part is that Scala HAS verbose syntax. You can use underscore notation, or you can use:

    () => something
     (x) => x.something

    I was at a job interview once where I was asked to write code to solve a problem using any language I wanted. I wrote a monstrous Ruby one-liner that was fun to write but that I would never, ever write in real life if I expected other developers to read my code. Sometimes there really is such a thing as too clever.

  • Dearth of working tutorials or documentation* (I plan on writing tutorials for as long as I am learning or interested in Lift)

In spite of these things I still think Lift has an interesting approach to many of the problems of web development. Rather than judge Lift based on my initial impressions, I’ll get into it some more before I decide if it’s a technology that I’d push. Ruby On Rails was one such technology, and in spite of all of its problems I still view it as an amazing development platform. The difference here is that by the time I started learning, several books had already been written, and there were plenty of tutorials on the web.

One of the problems with Lift is that most of the tutorials I have seen so far are written by actual developers of Lift. As a developer, it’s hard to figure out what people need to know. There’s a tendency to say to new developers coming on, “You only need to know A, B and C.” Then, thirty minutes later, when the new guy is completely lost, “Oh, and D! Sorry!” As a newbie, trust me, I won’t miss D. It’ll be in my face such that I’ll get mad, stop coding, then complain on Twitter, on the mailing list and in this blog.

As for the immediate future, I’ll continue writing tutorials for Lift, beginning with a tutorial coming soon about how to get started developing on Lift with NetBeans. Stay tuned.

Written by Ikai Lan

March 3, 2009 at 8:39 pm

Posted in Lift, Scala

Tagged with ,