About
Lucene 1) is a text search engine library.
The following application are Lucene application (ie build on it):
- New Relic Logs
- …
Structure
The text data model of Lucene is based on the following concept: 2):
- index,
- field
- and term.
An index contains a sequence of documents.
- A document is a sequence of fields (json based)
- A field is a named sequence of terms.
- A term is a sequence of bytes. (The same sequence of bytes in two different fields is considered a different term. Thus terms are represented as a pair: the string naming the field, and the bytes within the field.)
Document
A document is a basic unit of information that can be indexed.
For example, you can have a document for:
- a single customer,
- a single product,
- a single order
Index
An index is a collection of documents that have somewhat similar characteristics.
Lucene's terms index falls into the family of indexes known as an inverted index because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.
For example, you can have an index for:
- customer data,
- product catalog,
- order data.
Query
Lucene comes with a rich query language 3)
Syntax:
[field:]expression
where:
- field is the document field where the expression applies. It's optional and default to the field text
Cheetsheat:
Relation | Expression |
---|---|
equals | attribute:“value” |
does not equal | attribute:-“value” |
contains | attribute:*value* |
does not contain | attribute:-*value* |
starts with | attribute:value* |
ends with | attribute:*value |
has | has:attribute |
missing | missing:attribute |
Example:
- Search the term go in the field text
text:go
# same as
go
- Search the term way in the field title and the term go in the field text
title:"The Right Way" and text:go
# same as
title:"The Right Way" and go
Anatomy of a Lucene Application
To create an lucene application, you should 4):
- Create Documents by adding Fields;
- Create an IndexWriter and add documents to it with addDocument();
- Call QueryParser.parse() to build a query from a string; and
- Create an IndexSearcher and pass the query to its search() method.
Example:
Analyzer analyzer = new StandardAnalyzer();
Path indexPath = Files.createTempDirectory("tempIndex");
Directory directory = FSDirectory.open(indexPath);
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
assertEquals(1, hits.length);
// Iterate through the results:
for (int i = 0; i < hits.length; i++) {
Document hitDoc = isearcher.doc(hits[i].doc);
assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
}
ireader.close();
directory.close();
IOUtils.rm(indexPath);
Example on how to index and query
Simple examples in the repository 5) are:
- to creates an index for all the files contained in a directory IndexFiles.java
- to queries and searches an index SearchFiles.java
Usage:
java -cp lucene-core.jar:lucene-demo.jar:lucene-analysis-common.jar \
org.apache.lucene.demo.IndexFiles \
-index index \
-docs your/directory/path
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analysis-common.jar \
org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
...