Archive for April 25th, 2010
JRuby In-Memory Search Example With Lucene 3.0.1
Just for giggles I decided to port the In-Memory search example from my last blog post to JRuby. It’s been some time since I’ve used JRuby for anything, but the team has still been hard at work making strides towards better Java interoperability and ease of use. I downloaded JRuby 1.5.0_RC1, pointed my PATH to the /bin directory, and began hacking.
I’m incredibly impressed with the level of Java interop and startup speed improvements. Kudos to the JRuby team. Integrating Java couldn’t have been easier.
The example is below. Run it with the command:
jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb
require 'java'
# You either use the next line by require the JAR file, or you pass
# the -r flag to JRuby as follows:
# jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb
# require "lucene-core-3.0.1.jar"
java_import org.apache.lucene.analysis.standard.StandardAnalyzer
java_import org.apache.lucene.document.Document
java_import org.apache.lucene.document.Field
java_import org.apache.lucene.index.IndexWriter
java_import org.apache.lucene.queryParser.ParseException
java_import org.apache.lucene.queryParser.QueryParser
java_import org.apache.lucene.store.RAMDirectory
java_import org.apache.lucene.util.Version
java_import org.apache.lucene.search.IndexSearcher
java_import org.apache.lucene.search.TopScoreDocCollector
def create_document(title, content)
doc = Document.new
doc.add Field.new("title", title, Field::Store::YES, Field::Index::NO)
doc.add Field.new("content", content, Field::Store::YES, Field::Index::ANALYZED)
doc
end
def create_index
idx = RAMDirectory.new
writer = IndexWriter.new(idx, StandardAnalyzer.new(Version::LUCENE_30), IndexWriter::MaxFieldLength::LIMITED)
writer.add_document(create_document("Theodore Roosevelt",
"It behooves every man to remember that the work of the " +
"critic, is of altogether secondary importance, and that, " +
"in the end, progress is accomplished by the man who does " +
"things."))
writer.add_document(create_document("Friedrich Hayek",
"The case for individual freedom rests largely on the " +
"recognition of the inevitable and universal ignorance " +
"of all of us concerning a great many of the factors on " +
"which the achievements of our ends and welfare depend."))
writer.add_document(create_document("Ayn Rand",
"There is nothing to take a man's freedom away from " +
"him, save other men. To be free, a man must be free " +
"of his brothers."))
writer.add_document(create_document("Mohandas Gandhi",
"Freedom is not worth having if it does not connote " +
"freedom to err."))
writer.optimize
writer.close
idx
end
def search(searcher, query_string)
parser = QueryParser.new(Version::LUCENE_30, "content", StandardAnalyzer.new(Version::LUCENE_30))
query = parser.parse(query_string)
hits_per_page = 10
collector = TopScoreDocCollector.create(5 * hits_per_page, false)
searcher.search(query, collector)
# Notice how this differs from the Java version: JRuby automagically translates
# underscore_case_methods into CamelCaseMethods, but scoreDocs is not a method:
# it's a field. That's why we have to use CamelCase here, otherwise JRuby would
# complain that score_docs is an undefined method.
hits = collector.top_docs.scoreDocs
hit_count = collector.get_total_hits
if hit_count.zero?
puts "No matching documents."
else
puts "%d total matching documents" % hit_count
puts "Hits for %s were found in quotes by:" % query_string
hits.each_with_index do |score_doc, i|
doc_id = score_doc.doc
doc_score = score_doc.score
puts "doc_id: %s \t score: %s" % [doc_id, doc_score]
doc = searcher.doc(doc_id)
puts "%d. %s" % [i, doc.get("title")]
puts "Content: %s" % doc.get("content")
puts
end
end
end
def main
index = create_index
searcher = IndexSearcher.new(index)
search(searcher, "freedom")
search(searcher, "free");
search(searcher, "progress or achievements");
search(searcher, "ikaisays.com")
searcher.close
end
main()
