Ikai Lan says

I say things!

Posts Tagged ‘jruby

JRuby In-Memory Search Example With Lucene 3.0.1

with 2 comments

Just for giggles I decided to port the In-Memory search example from my last blog post to JRuby. It’s been some time since I’ve used JRuby for anything, but the team has still been hard at work making strides towards better Java interoperability and ease of use. I downloaded JRuby 1.5.0_RC1, pointed my PATH to the /bin directory, and began hacking.

I’m incredibly impressed with the level of Java interop and startup speed improvements. Kudos to the JRuby team. Integrating Java couldn’t have been easier.

The example is below. Run it with the command:


jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb

require 'java'
# You either use the next line by require the JAR file, or you pass
# the -r flag to JRuby as follows:
# jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb 
# require "lucene-core-3.0.1.jar"

java_import org.apache.lucene.analysis.standard.StandardAnalyzer
java_import org.apache.lucene.document.Document
java_import org.apache.lucene.document.Field
java_import org.apache.lucene.index.IndexWriter
java_import org.apache.lucene.queryParser.ParseException
java_import org.apache.lucene.queryParser.QueryParser
java_import org.apache.lucene.store.RAMDirectory
java_import org.apache.lucene.util.Version

java_import org.apache.lucene.search.IndexSearcher
java_import org.apache.lucene.search.TopScoreDocCollector


def create_document(title, content)
  doc = Document.new
  doc.add Field.new("title", title, Field::Store::YES, Field::Index::NO)
  doc.add Field.new("content", content, Field::Store::YES, Field::Index::ANALYZED)  
  doc
end

def create_index
  idx     = RAMDirectory.new
  writer  = IndexWriter.new(idx, StandardAnalyzer.new(Version::LUCENE_30), IndexWriter::MaxFieldLength::LIMITED)

  writer.add_document(create_document("Theodore Roosevelt",
          "It behooves every man to remember that the work of the " +
                  "critic, is of altogether secondary importance, and that, " +
                  "in the end, progress is accomplished by the man who does " +
                  "things."))
  writer.add_document(create_document("Friedrich Hayek",
          "The case for individual freedom rests largely on the " +
                  "recognition of the inevitable and universal ignorance " +
                  "of all of us concerning a great many of the factors on " +
                  "which the achievements of our ends and welfare depend."))
  writer.add_document(create_document("Ayn Rand",
          "There is nothing to take a man's freedom away from " +
                  "him, save other men. To be free, a man must be free " +
                  "of his brothers."))
  writer.add_document(create_document("Mohandas Gandhi",
          "Freedom is not worth having if it does not connote " +
                  "freedom to err."))

  writer.optimize
  writer.close
  idx
end

def search(searcher, query_string)
  parser = QueryParser.new(Version::LUCENE_30, "content", StandardAnalyzer.new(Version::LUCENE_30))
  query = parser.parse(query_string)
  
  hits_per_page = 10
  
  collector = TopScoreDocCollector.create(5 * hits_per_page, false)
  searcher.search(query, collector)
  
  # Notice how this differs from the Java version: JRuby automagically translates
  # underscore_case_methods into CamelCaseMethods, but scoreDocs is not a method:
  # it's a field. That's why we have to use CamelCase here, otherwise JRuby would
  # complain that score_docs is an undefined method.
  hits = collector.top_docs.scoreDocs
  
  hit_count = collector.get_total_hits
    
  if hit_count.zero?
    puts "No matching documents."
  else
    puts "%d total matching documents" % hit_count
    
    puts "Hits for %s were found in quotes by:" % query_string
    
    hits.each_with_index do |score_doc, i|
      doc_id = score_doc.doc
      doc_score = score_doc.score
      
      puts "doc_id: %s \t score: %s" % [doc_id, doc_score]
      
      doc = searcher.doc(doc_id)
      puts "%d. %s" % [i, doc.get("title")]
      puts "Content: %s" % doc.get("content")
      puts
      
    end
    
  end

end

def main
  index = create_index
  searcher = IndexSearcher.new(index)

  search(searcher, "freedom")
  search(searcher, "free");
  search(searcher, "progress or achievements");
  search(searcher, "ikaisays.com")

  searcher.close
end

main()
Advertisements

Written by Ikai Lan

April 25, 2010 at 7:49 pm

Posted in JRuby, JRuby, Ruby, Software Development

Tagged with , ,