JRuby In-Memory Search Example With Lucene 3.0.1
Just for giggles I decided to port the In-Memory search example from my last blog post to JRuby. It’s been some time since I’ve used JRuby for anything, but the team has still been hard at work making strides towards better Java interoperability and ease of use. I downloaded JRuby 1.5.0_RC1, pointed my PATH to the /bin directory, and began hacking.
I’m incredibly impressed with the level of Java interop and startup speed improvements. Kudos to the JRuby team. Integrating Java couldn’t have been easier.
The example is below. Run it with the command:
jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb
require 'java' # You either use the next line by require the JAR file, or you pass # the -r flag to JRuby as follows: # jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb # require "lucene-core-3.0.1.jar" java_import org.apache.lucene.analysis.standard.StandardAnalyzer java_import org.apache.lucene.document.Document java_import org.apache.lucene.document.Field java_import org.apache.lucene.index.IndexWriter java_import org.apache.lucene.queryParser.ParseException java_import org.apache.lucene.queryParser.QueryParser java_import org.apache.lucene.store.RAMDirectory java_import org.apache.lucene.util.Version java_import org.apache.lucene.search.IndexSearcher java_import org.apache.lucene.search.TopScoreDocCollector def create_document(title, content) doc = Document.new doc.add Field.new("title", title, Field::Store::YES, Field::Index::NO) doc.add Field.new("content", content, Field::Store::YES, Field::Index::ANALYZED) doc end def create_index idx = RAMDirectory.new writer = IndexWriter.new(idx, StandardAnalyzer.new(Version::LUCENE_30), IndexWriter::MaxFieldLength::LIMITED) writer.add_document(create_document("Theodore Roosevelt", "It behooves every man to remember that the work of the " + "critic, is of altogether secondary importance, and that, " + "in the end, progress is accomplished by the man who does " + "things.")) writer.add_document(create_document("Friedrich Hayek", "The case for individual freedom rests largely on the " + "recognition of the inevitable and universal ignorance " + "of all of us concerning a great many of the factors on " + "which the achievements of our ends and welfare depend.")) writer.add_document(create_document("Ayn Rand", "There is nothing to take a man's freedom away from " + "him, save other men. To be free, a man must be free " + "of his brothers.")) writer.add_document(create_document("Mohandas Gandhi", "Freedom is not worth having if it does not connote " + "freedom to err.")) writer.optimize writer.close idx end def search(searcher, query_string) parser = QueryParser.new(Version::LUCENE_30, "content", StandardAnalyzer.new(Version::LUCENE_30)) query = parser.parse(query_string) hits_per_page = 10 collector = TopScoreDocCollector.create(5 * hits_per_page, false) searcher.search(query, collector) # Notice how this differs from the Java version: JRuby automagically translates # underscore_case_methods into CamelCaseMethods, but scoreDocs is not a method: # it's a field. That's why we have to use CamelCase here, otherwise JRuby would # complain that score_docs is an undefined method. hits = collector.top_docs.scoreDocs hit_count = collector.get_total_hits if hit_count.zero? puts "No matching documents." else puts "%d total matching documents" % hit_count puts "Hits for %s were found in quotes by:" % query_string hits.each_with_index do |score_doc, i| doc_id = score_doc.doc doc_score = score_doc.score puts "doc_id: %s \t score: %s" % [doc_id, doc_score] doc = searcher.doc(doc_id) puts "%d. %s" % [i, doc.get("title")] puts "Content: %s" % doc.get("content") puts end end end def main index = create_index searcher = IndexSearcher.new(index) search(searcher, "freedom") search(searcher, "free"); search(searcher, "progress or achievements"); search(searcher, "ikaisays.com") searcher.close end main()