Archive for the ‘Scala’ Category
Using pattern matching with regular expressions in Scala
I’ve been trying to use Scala more and more so I can gain some experience and exposure to it. A couple of weeks ago, I wrote a Scala log parser for Ruby on Rails. It is terribly newbie-ish – the classes are mutable and it’s disorganized. It’s a mess. Jorge Ortiz from the Scala mailing list was kind enough to rewrite it in a more Scala style. It completely blew my mind how terse Scala can become when written correctly.
It bothered me, however, dealing with regular expressions the way that I did. The Java interface is pretty clumsy and nowhere near as clean as regular expression pattern extraction in Perl or Ruby.
As it turns out, it’s surprisingly easy to extract text using Regular Expressions in Scala. Throw away Pattern.compile! Check out this hotness below:
First, let’s import Scala’s regex package:
import scala.util.matching.Regex
Now we declare a regular expression to match against. We can do this one of two ways:
val LogEntry = new Regex("""Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""")
I use triple quotes here to signify that I am creating a raw string. A raw string means that I do not need to escape characters like the \ character. If I didn't do this, I'd be forced to use strings like "\\d+". Believe it or not, that extra slash throws me off. Just goes to show that I have written way too many parsers.
Alternatively, I can declare a new Regex by doing this:
val LogEntry = """Completed in (\d+)ms \(View: (\d+), DB: (\d+)\) \| (\d+) OK \[http://app.domain.com(.*)\?.*""".r
Strings have a method called "r", which will convert it to a Regex object. I'm not sold on this syntax at the moment, since it doesn't play well with eyeball scans, but I'm putting it here for those folks that absolutely need to save characters.
There's nothing really special here yet. The next step is REALLY cool:
val line = "Completed in 100ms (View: 25, DB: 75) | 200 OK [http://app.domain.com?params=here]"
scala> val LogEntry(totalTime, viewTime, dbTime, responseCode, uri) = line
totalTime: String = 100
viewTime: String = 25
dbTime: String = 75
responseCode: String = 200
uri: String =
The local variables totalTime, viewTime, dbTime, responseCode and uri are now bound to the values we want to extract from the original line! The regular expression value defines an unapplySeq method. I’m not quite good enough at Scala to tell you in any definite terms what that means, except that you can use the code in a pattern match:
line match {
case LogEntry(totalTime, viewTime, dbTime, responseCode, uri) => {
/* Process the data */
// do something with totalTime.toInt
// do something with viewTime.toInt
// etc ...
}
case _ => // Do nothing
}
Because you can use a pattern match, and patterns will be be matched in the order of definition, this means that you can create several regular expressions representing lines you want to extract, then process them easily in using pattern matching.
Pretty powerful stuff. What would really make my day would be if someone knew how I could extract the values totalTime, viewTime, and dbTime as integers and not have to do a conversion – I’m already matching with \d+. Ideas?
First impressions of Lift Scala web development framework from a Ruby on Rails developer
Over the past few months I’ve been hearing a lot about Scala and, in general, very interested. Scala, short for “scalable language” (and why I will continue to pronounce it “skay-lah” rather than “skah-lah”) is a strongly typed JVM language that combines aspects of functional programming and dynamic languages with static typing and the JVM to provide a language that has some of the flexibility of languages such as Ruby or Haskell, but with the performance and interoperability of Java.
One of the projects leading the charge in Scala is Lift, a web development framework. I’ve been developing in Ruby on Rails for the past few years, and really, I don’t need to learn another framework. I took a very close examination at Django and even built a few projects, but stuck with Rails as my primary tool of choice. Lift interests me for the following reasons:
- can run in any Java Application Server
- can run inline Java
Why are these important? I’ve met a lot of consultants who will recommend a solution for a client that requires a deployment mod_php/Erlang/WSGI/Mongrel and got the project shot down. But switch the pitch to Java running in an application server? Happy client. I’ve been pushing JRuby hard on every other Rails developer I meet, so these requirements are not high on my list, though they add a lot of points. Also – it should be worth mentioning that many, many Java based frameworks such as Grails or the recently open sourced AribaWeb can do these.
- high performance* (have not seen with my own eyes)
- uses Scala
- out of the box Comet support
Comet is a way of simulating push applications over HTTP using a browser, which is completely a client pull type of application. Comet is also known as long polling, and it is how every single JavaScript browser chat application works (Meebo, Facebook chat, etc). Basically, the client JavaScript opens an XMLHttpRequest (XHR for short) to the server which the server does not respond to until there is data to be pushed (hence the name “long polling”). This gets rid of the need for clients to poll the server at intervals which has two problems: at longer intervals, data is not pushed as quickly, which would make an IM application unusable. At shorter intervals, browsers would quickly saturate their network connection as well as the server. Comet isn’t without its problems. Most Java applications, for instance, use a “one thread per request” model for each open connection, which does not scale efficiently as the threads would be the limiting factor in the number of clients that could connect at once even though most of the time the threads would be idle. In fact, open connections was such a problem that Facebook’s chat implementation is completely written in Erlang, a concurrent language that can create millions of lightweight processes and was really the only implementation that could scale efficiently to their needs. They blog about it here: http://www.facebook.com/note.php?note_id=14218138919
The way Lift deals with Comet scaling issues is by making use of Jetty Continuations. The thread-per-request model is still used, however, threads are suspended when they are not needed, resulting in a much more efficient use of resources.
It’s these reasons that make Lift appealing to me to learn. However, after fiddling with Lift, there are a few things off the top of my head that I already don’t like much about Lift or are so different they threaten to make my brain reboot:
- Servers are not stateless. If you need to horizontally scale, your load balancer needs to read the JSESSIONID parameter in the HTTP request and direct traffic based on that information. I’ve been told that Lift is so incredibly high performance this isn’t necessary. This doesn’t answer the question of hot failover, and frankly, I was a bit disappointed.
- Everything depends on state! This is probably WHY Lift can be so high performance. Most web frameworks deal with requests as they come in, looking up the same data per request to reinstantiate session objects, User objects, or any other objects that need to persist longer than a web request. It’s a completely different way of thinking about problems and web development that experienced web developers coming into Lift from other frameworks will have to come in with a blank slate. Lift encourages abstracting away the request/response cycle. It remains to be seen whether this is a good thing or a bad thing.
- Unintuitive way to add new pages. You have to add to a sitemap in what might be the most unintuitive manner possible:
val entries = Menu(Loc("Home", List("index"), "Home")) :: Menu(Loc("Test", List("test"), "Test")) :: User.sitemapThe :: is the operator for list concatenation. This will make all your pages appear in the site menu. To make pages that don’t appear in the site menu?
Menu(Loc("MyHiddenPage", List("hidden"), "hidden", Hidden))To me this seems unintuitive, but then again, I don’t actually understand what is happening here, as the API docs are unusable. I haven’t figured out routing yet, for instance. When I was learning Django, I figured out how to set up routes to functions in minutes.
- Scala shorthand. I’ve mentioned this and all the functional programmers have screamed bloody murder. Code is for human beings, not cyborgs. I understand that it’s clever to save keystrokes:
list.sort(_ < _)
As opposed to, say, Ruby:
list.sort { |a, b| a < b }But these are trivial examples. There’s code all over the place that looks like this:
fun1(a, b _).call(_).fun2(_ <= _)
I’m sorry, but that’s not very welcoming. The worst part is that Scala HAS verbose syntax. You can use underscore notation, or you can use:
() => something (x) => x.something
I was at a job interview once where I was asked to write code to solve a problem using any language I wanted. I wrote a monstrous Ruby one-liner that was fun to write but that I would never, ever write in real life if I expected other developers to read my code. Sometimes there really is such a thing as too clever.
- Dearth of working tutorials or documentation* (I plan on writing tutorials for as long as I am learning or interested in Lift)
In spite of these things I still think Lift has an interesting approach to many of the problems of web development. Rather than judge Lift based on my initial impressions, I’ll get into it some more before I decide if it’s a technology that I’d push. Ruby On Rails was one such technology, and in spite of all of its problems I still view it as an amazing development platform. The difference here is that by the time I started learning, several books had already been written, and there were plenty of tutorials on the web.
One of the problems with Lift is that most of the tutorials I have seen so far are written by actual developers of Lift. As a developer, it’s hard to figure out what people need to know. There’s a tendency to say to new developers coming on, “You only need to know A, B and C.” Then, thirty minutes later, when the new guy is completely lost, “Oh, and D! Sorry!” As a newbie, trust me, I won’t miss D. It’ll be in my face such that I’ll get mad, stop coding, then complain on Twitter, on the mailing list and in this blog.
As for the immediate future, I’ll continue writing tutorials for Lift, beginning with a tutorial coming soon about how to get started developing on Lift with NetBeans. Stay tuned.
