Lucene

About

Lucene 1) is a text search engine library.

The following application are Lucene application (ie build on it):

Structure

The text data model of Lucene is based on the following concept: 2):

An index contains a sequence of documents.

  • A document is a sequence of fields (json based)
  • A field is a named sequence of terms.
  • A term is a sequence of bytes. (The same sequence of bytes in two different fields is considered a different term. Thus terms are represented as a pair: the string naming the field, and the bytes within the field.)

Document

A document is a basic unit of information that can be indexed.

For example, you can have a document for:

  • a single customer,
  • a single product,
  • a single order

Index

An index is a collection of documents that have somewhat similar characteristics.

Lucene's terms index falls into the family of indexes known as an inverted index because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.

For example, you can have an index for:

  • customer data,
  • product catalog,
  • order data.

Query

Lucene comes with a rich query language 3)

Syntax:

[field:]expression

where:

  • field is the document field where the expression applies. It's optional and default to the field text

Cheetsheat:

Relation Expression
equals attribute:“value”
does not equal attribute:-“value”
contains attribute:*value*
does not contain attribute:-*value*
starts with attribute:value*
ends with attribute:*value
has has:attribute
missing missing:attribute

Example:

  • Search the term go` in the field `text
text:go
# same as
go 
  • Search the term way in the field title and the term go in the field text
title:"The Right Way" and text:go 
# same as
title:"The Right Way" and go 

Anatomy of a Lucene Application

To create an lucene application, you should 4):

  • Create Documents by adding Fields;
  • Create an IndexWriter and add documents to it with addDocument();
  • Call QueryParser.parse() to build a query from a string; and
  • Create an IndexSearcher and pass the query to its search() method.

Example:

Analyzer analyzer = new StandardAnalyzer();

Path indexPath = Files.createTempDirectory("tempIndex");
Directory directory = FSDirectory.open(indexPath);
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();

// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
assertEquals(1, hits.length);
// Iterate through the results:
for (int i = 0; i < hits.length; i++) {
    Document hitDoc = isearcher.doc(hits[i].doc);
    assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
}
ireader.close();
directory.close();
IOUtils.rm(indexPath);

Example on how to index and query

Simple examples in the repository 5) are:

Usage:

java -cp lucene-core.jar:lucene-demo.jar:lucene-analysis-common.jar \
    org.apache.lucene.demo.IndexFiles \
    -index index \
    -docs your/directory/path
adding rec.food.recipes/soups/abalone-chowder
      [ ... ]
java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analysis-common.jar \
   org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
...

Powered by ComboStrap