Extending Lucene Classes
Many areas of the Lucene API expect the programmer to provide their own
implementation or specialization of a feature where the default is
inappropriate. For example, text analyzers and tokenizers are an area
where many parameters and environmental or cultural factors are calling
for customization.
PyLucene enables this by providing Java extension points listed below
that serve as proxies for Java to call back into the Python
implementations of these customizations.
To learn more about this topic, please refer to the PyLucene paper
included earlier.
Unless otherwise documented, passing the Python extension instance
where a wrapped Java instance returned by PyLucene is normally expected
is sufficient for the Python extension instance to be wrapped by Java
for its use.
Each extension point below enumerates the methods that a Python class
needs to implement in order to be functioning as an 'extension' of the
corresponding Java Lucene class.
. org.apache.lucene.analysis.Analyzer extension point:
TokenStream tokenStream(fieldName, reader)
. org.apache.lucene.analysis.CharTokenizer extension point:
boolean isTokenChar(char)
char normalize(char)
In order to instantiate such a custom char tokenizer, the additional
charTokenizer() factory method defined on
org.apache.lucene.analysis.TokenStream instances needs to be invoked
with the Python extension instance.
. org.apache.lucene.analysis.TokenFilter extension point:
Token next()
In order to instantiate such a custom token filter, the additional
tokenFilter() factory method defined on
org.apache.lucene.analysis.TokenStream instances needs to be invoked
with the Python extension instance.
. org.apache.lucene.analysis.TokenStream extension point:
Token next()
. org.apache.lucene.queryParser.QueryParser extension point:
Query getBooleanQuery(super, clauses)
Query getFieldQuery(super, fieldName, queryText, slop=None)
Query getFuzzyQuery(super, fieldName, termText, minSimilarity)
Query getPrefixQuery(super, fieldName, termText)
Query getRangeQuery(super, fieldName, part1, part2, inclusive)
Query getWildcardQuery(super, fieldName, termText)
The 'super' argument is provided to invoke the default Java
implementation of these methods as needed.
In order to instantiate such a custom query parser, the additional
queryParser() factory method defined on
org.apache.lucene.analysis.Analyzer instances needs to be invoked
with the Python extension instance.
Please refer to the AdvancedQueryParserTest.py and
CustomQueryParser.py 'Lucene in Action' samples for more details.
. org.apache.lucene.search.Filter extension point:
BitSet bits(indexReader)
. org.apache.lucene.search.FilteredTermEnum extension point:
float difference()
boolean termCompare(term)
boolean endEnum()
void setEnum(termEnum)
. org.apache.lucene.search.HitCollector extension point:
void collect(docNum, score)
. org.apache.lucene.search.ScoreDocComparator extension point:
int compare(scoreDoc0, scoreDoc1)
int sortType()
Comparable sortValue(ScoreDoc i)
Please refer to the DistanceComparatorSource.py and
DistanceSortingTest.py 'Lucene in Action' samples for more details on
writing custom sorting code in Python.
. org.apache.lucene.search.SortComparator extension point:
ScoreDocComparator newComparator(indexReader, fieldName)
Comparable getComparable(termText)
Please refer to the DistanceComparatorSource.py and
DistanceSortingTest.py 'Lucene in Action' samples for more details on
writing custom sorting code in Python.
. org.apache.lucene.search.SortComparatorSource extension point:
ScoreDocComparator newComparator(indexReader, fieldName)
Please refer to the DistanceComparatorSource.py and
DistanceSortingTest.py 'Lucene in Action' samples for more details on
writing custom sorting code in Python.
. org.apache.lucene.search.Searchable extension point:
void close()
int docFreq(term)
Document doc(n)
int maxDoc()
void searchAll(query, filter, hitCollector)
TopDocs search(query, filter, n)
TopFieldDocs searchSorted(query, filter, n, sort)
Query rewrite(query)
Explanation explain(query, docNum)
. org.apache.lucene.search.Similarity extension point:
float coord(overlap, maxOverlap)
float idf(term, searcher)
float idf(terms, searcher)
float idf(docFreq, numDocs)
float lengthNorm(fieldName, numTokens)
float queryNorm(sumOfSquaredWeights)
float sloppyFreq(distance)
float tf(freq)
. org.apache.lucene.search.highlight.Formatter extension point:
string highlightTerm(originalText, tokenGroup)
. org.apache.lucene.store.Directory extension point:
void close();
IndexOutput createOutput(name)
void deleteFile(name)
boolean fileExists(name)
long fileLength(name)
long fileModified(name)
string[] list()
Lock makeLock(String name)
IndexInput openInput(name)
void renameFile(from, to)
void touchFile(name)
. org.apache.lucene.store.IndexInput extension point:
void close(isClone)
long length()
string read(length, pos)
void seek(pos)
Because IndexInput instances may be cloned, the close() method takes
an extra argument in python telling whether a clone is being closed.
. org.apache.lucene.store.IndexOutput extension point:
void close()
long length()
void write(string)
void seek(pos)
. org.apache.lucene.store.Lock extension point:
boolean isLocked()
boolean obtain()
boolean obtain(lockWaitTimeout)
void release()
. java.io.Reader extension point:
void close()
string read(len)
. java.lang.Comparable extension point:
int compareTo(object)
. java.lang.Runnable extension point:
void run()