Dr Frank Blog: jTool: Apache Lucene

Thursday, May 10, 2012

jTool: Apache Lucene

I had a full-day client briefing today in St. Louis on Big Data analytics for enterprise research. During the briefing, Lucene was mentioned by architects and developers from both sides over a dozen times as an useful tool for text indexing even at the enterprise level.

As an open-source project, it says a lot for Lucene when two 100-year companies agreed on its importance.

So I am now adding the 2nd Apache project (the first being Apache Hadoop) to the jTool catalog.

Overview of Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Lucene offers powerful features through a simple API:

Scalable, High-Performance Indexing

over 95GB/hour on modern hardware
small RAM requirements -- only 1MB heap
incremental indexing as fast as batch indexing
index size roughly 20-30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

ranked searching -- best results returned first
many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
fielded searching (e.g., title, author, contents)
date-range searching
sorting by any field
multiple-index searching with merged results
allows simultaneous update and searching

Cross-Platform Solution

Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
100%-pure Java
Implementations in other programming languages available that are index-compatible

Links

Apache Lucene Core Project

Dr Frank Blog

Thursday, May 10, 2012

jTool: Apache Lucene

Powerful, Accurate and Efficient Search Algorithms

Cross-Platform Solution

No comments:

Post a Comment