Ikai Lan says

I say things!

JRuby In-Memory Search Example With Lucene 3.0.1

with 2 comments

Just for giggles I decided to port the In-Memory search example from my last blog post to JRuby. It’s been some time since I’ve used JRuby for anything, but the team has still been hard at work making strides towards better Java interoperability and ease of use. I downloaded JRuby 1.5.0_RC1, pointed my PATH to the /bin directory, and began hacking.

I’m incredibly impressed with the level of Java interop and startup speed improvements. Kudos to the JRuby team. Integrating Java couldn’t have been easier.

The example is below. Run it with the command:


jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb

require 'java'
# You either use the next line by require the JAR file, or you pass
# the -r flag to JRuby as follows:
# jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb 
# require "lucene-core-3.0.1.jar"

java_import org.apache.lucene.analysis.standard.StandardAnalyzer
java_import org.apache.lucene.document.Document
java_import org.apache.lucene.document.Field
java_import org.apache.lucene.index.IndexWriter
java_import org.apache.lucene.queryParser.ParseException
java_import org.apache.lucene.queryParser.QueryParser
java_import org.apache.lucene.store.RAMDirectory
java_import org.apache.lucene.util.Version

java_import org.apache.lucene.search.IndexSearcher
java_import org.apache.lucene.search.TopScoreDocCollector


def create_document(title, content)
  doc = Document.new
  doc.add Field.new("title", title, Field::Store::YES, Field::Index::NO)
  doc.add Field.new("content", content, Field::Store::YES, Field::Index::ANALYZED)  
  doc
end

def create_index
  idx     = RAMDirectory.new
  writer  = IndexWriter.new(idx, StandardAnalyzer.new(Version::LUCENE_30), IndexWriter::MaxFieldLength::LIMITED)

  writer.add_document(create_document("Theodore Roosevelt",
          "It behooves every man to remember that the work of the " +
                  "critic, is of altogether secondary importance, and that, " +
                  "in the end, progress is accomplished by the man who does " +
                  "things."))
  writer.add_document(create_document("Friedrich Hayek",
          "The case for individual freedom rests largely on the " +
                  "recognition of the inevitable and universal ignorance " +
                  "of all of us concerning a great many of the factors on " +
                  "which the achievements of our ends and welfare depend."))
  writer.add_document(create_document("Ayn Rand",
          "There is nothing to take a man's freedom away from " +
                  "him, save other men. To be free, a man must be free " +
                  "of his brothers."))
  writer.add_document(create_document("Mohandas Gandhi",
          "Freedom is not worth having if it does not connote " +
                  "freedom to err."))

  writer.optimize
  writer.close
  idx
end

def search(searcher, query_string)
  parser = QueryParser.new(Version::LUCENE_30, "content", StandardAnalyzer.new(Version::LUCENE_30))
  query = parser.parse(query_string)
  
  hits_per_page = 10
  
  collector = TopScoreDocCollector.create(5 * hits_per_page, false)
  searcher.search(query, collector)
  
  # Notice how this differs from the Java version: JRuby automagically translates
  # underscore_case_methods into CamelCaseMethods, but scoreDocs is not a method:
  # it's a field. That's why we have to use CamelCase here, otherwise JRuby would
  # complain that score_docs is an undefined method.
  hits = collector.top_docs.scoreDocs
  
  hit_count = collector.get_total_hits
    
  if hit_count.zero?
    puts "No matching documents."
  else
    puts "%d total matching documents" % hit_count
    
    puts "Hits for %s were found in quotes by:" % query_string
    
    hits.each_with_index do |score_doc, i|
      doc_id = score_doc.doc
      doc_score = score_doc.score
      
      puts "doc_id: %s \t score: %s" % [doc_id, doc_score]
      
      doc = searcher.doc(doc_id)
      puts "%d. %s" % [i, doc.get("title")]
      puts "Content: %s" % doc.get("content")
      puts
      
    end
    
  end

end

def main
  index = create_index
  searcher = IndexSearcher.new(index)

  search(searcher, "freedom")
  search(searcher, "free");
  search(searcher, "progress or achievements");
  search(searcher, "ikaisays.com")

  searcher.close
end

main()
About these ads

Written by Ikai Lan

April 25, 2010 at 7:49 pm

Posted in JRuby, JRuby, Ruby, Software Development

Tagged with , ,

2 Responses

Subscribe to comments with RSS.

  1. Cool !
    I have also done this, see http://github.com/andreasronge/neo4j
    The lucene code (lib/lucene) does not have any dependencies to neo4j so it would be easy to create a gem with only the lucene stuff.
    In the rjb branch I also have lucene working together with CRuby by using the RJB (jni) library.

    Andreas Ronge

    April 26, 2010 at 4:25 am

  2. […] I just got back from Google-sponsored hack session at RailsConf. Google App Engine and JRuby combine to create some real awesomeness. I created an small app that uses some classes from Lucene (written in Java) with a Rails app (written in Ruby) to search text stored in the App Engine data store. The idea was to search English text using standard linguistic analysis to distinguish whole words, as well as find variants of the same root word (e.g. find “run” and “running” when searching for run, but don’t find “runt”). It was helpful to read prior work by Ikai Lan. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s