JRuby In-Memory Search Example With Lucene 3.0.1
Just for giggles I decided to port the In-Memory search example from my last blog post to JRuby. It’s been some time since I’ve used JRuby for anything, but the team has still been hard at work making strides towards better Java interoperability and ease of use. I downloaded JRuby 1.5.0_RC1, pointed my PATH to the /bin directory, and began hacking.
I’m incredibly impressed with the level of Java interop and startup speed improvements. Kudos to the JRuby team. Integrating Java couldn’t have been easier.
The example is below. Run it with the command:
jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb
require 'java'
# You either use the next line by require the JAR file, or you pass
# the -r flag to JRuby as follows:
# jruby -r /path/to/lucene-core-3.0.1.jar inmemory.rb
# require "lucene-core-3.0.1.jar"
java_import org.apache.lucene.analysis.standard.StandardAnalyzer
java_import org.apache.lucene.document.Document
java_import org.apache.lucene.document.Field
java_import org.apache.lucene.index.IndexWriter
java_import org.apache.lucene.queryParser.ParseException
java_import org.apache.lucene.queryParser.QueryParser
java_import org.apache.lucene.store.RAMDirectory
java_import org.apache.lucene.util.Version
java_import org.apache.lucene.search.IndexSearcher
java_import org.apache.lucene.search.TopScoreDocCollector
def create_document(title, content)
doc = Document.new
doc.add Field.new("title", title, Field::Store::YES, Field::Index::NO)
doc.add Field.new("content", content, Field::Store::YES, Field::Index::ANALYZED)
doc
end
def create_index
idx = RAMDirectory.new
writer = IndexWriter.new(idx, StandardAnalyzer.new(Version::LUCENE_30), IndexWriter::MaxFieldLength::LIMITED)
writer.add_document(create_document("Theodore Roosevelt",
"It behooves every man to remember that the work of the " +
"critic, is of altogether secondary importance, and that, " +
"in the end, progress is accomplished by the man who does " +
"things."))
writer.add_document(create_document("Friedrich Hayek",
"The case for individual freedom rests largely on the " +
"recognition of the inevitable and universal ignorance " +
"of all of us concerning a great many of the factors on " +
"which the achievements of our ends and welfare depend."))
writer.add_document(create_document("Ayn Rand",
"There is nothing to take a man's freedom away from " +
"him, save other men. To be free, a man must be free " +
"of his brothers."))
writer.add_document(create_document("Mohandas Gandhi",
"Freedom is not worth having if it does not connote " +
"freedom to err."))
writer.optimize
writer.close
idx
end
def search(searcher, query_string)
parser = QueryParser.new(Version::LUCENE_30, "content", StandardAnalyzer.new(Version::LUCENE_30))
query = parser.parse(query_string)
hits_per_page = 10
collector = TopScoreDocCollector.create(5 * hits_per_page, false)
searcher.search(query, collector)
# Notice how this differs from the Java version: JRuby automagically translates
# underscore_case_methods into CamelCaseMethods, but scoreDocs is not a method:
# it's a field. That's why we have to use CamelCase here, otherwise JRuby would
# complain that score_docs is an undefined method.
hits = collector.top_docs.scoreDocs
hit_count = collector.get_total_hits
if hit_count.zero?
puts "No matching documents."
else
puts "%d total matching documents" % hit_count
puts "Hits for %s were found in quotes by:" % query_string
hits.each_with_index do |score_doc, i|
doc_id = score_doc.doc
doc_score = score_doc.score
puts "doc_id: %s \t score: %s" % [doc_id, doc_score]
doc = searcher.doc(doc_id)
puts "%d. %s" % [i, doc.get("title")]
puts "Content: %s" % doc.get("content")
puts
end
end
end
def main
index = create_index
searcher = IndexSearcher.new(index)
search(searcher, "freedom")
search(searcher, "free");
search(searcher, "progress or achievements");
search(searcher, "ikaisays.com")
searcher.close
end
main()
Cool !
I have also done this, see http://github.com/andreasronge/neo4j
The lucene code (lib/lucene) does not have any dependencies to neo4j so it would be easy to create a gem with only the lucene stuff.
In the rjb branch I also have lucene working together with CRuby by using the RJB (jni) library.
Andreas Ronge
April 26, 2010 at 4:25 am
[...] I just got back from Google-sponsored hack session at RailsConf. Google App Engine and JRuby combine to create some real awesomeness. I created an small app that uses some classes from Lucene (written in Java) with a Rails app (written in Ruby) to search text stored in the App Engine data store. The idea was to search English text using standard linguistic analysis to distinguish whole words, as well as find variants of the same root word (e.g. find “run” and “running” when searching for run, but don’t find “runt”). It was helpful to read prior work by Ikai Lan. [...]
full text search on app engine | the evolving ultrasaurus
June 10, 2010 at 8:58 am