Developer Platform
Ted Leung, Katie Capps Parlante
Open Source Applications Foundation
543 Howard Street, 5th Floor
San Francisco, CA 94105
twl@osafoundation.org,
capps@osafoundation.org
45 minute slot
Chandler aspires to be an innovative open source personal information manager (PIM). In addition to being written in Python, Chandler is using the following open source libraries: BerkeleyDB, M2Crypto, Twisted, pyLucene, and wxPython/wxWidgets. Chandler is designed to be an extensible PIM. Chandler's unit of extensiblity is called a parcel, and Chandler's "built-in" functionality is itself composed of parcels. Internally, Chandler is designed as layers of frameworks which provide applications functionality to parcels. Parcels communicate with each other via the data in the Chandler repository.
Several opportunities for open source developers to get involved with the
Chandler project:
1. Many developers will be interested in producing their own parcels. These parcels can extend Chandler to deal with new data types (called Kinds) and new user interfaces to that data. These parcels can leverage any data type in the Chandler repository, whether the data type is supplied by the base Chandler application or by another parcel.
2. Some developers will be interested in extending/improving/bugfixing the core appliation frameworks of the Chandler system.
The goal of our presentation and paper is to allow someone to begin developing a parcel that extends the Chandler user interface. We plan to cover the following topics using an existing Chandler parcel as a concrete example:
- Introduction to the basic Chandler User Interface elements
- General overview of Chandler architecture
- Repository and data model
- Services (e-mail, Journal.WebDAV, etc)
- Chandler Presentation/Interaction framework (CPIA)
- How to organize a typical extension parcel
- How to extend the content model schema with new Kinds
- How to add a new detail view for a new Kind
- How to extend the sidebar with new Kinds and new collections.
- How to add a new type of summary view.
- How to extend Chandler with new menus and toolbar buttons
- How to add background tasks
PyLucene
Proposal for PyCon 2005 45mn talk about PyLucene
================================================
Title: Pulling Java Lucene into Python: PyLucene
Topic: Python Integration
As we needed an open source text search engine library for our
Python based project, we made the following bet: what if we pulled
together Java Lucene, GNU's gcj java compiler and SWIG to build a
python extension ? In this presentation we'd like to talk about the
challenges we met since this project was started a year ago.
OSAF's flagship project, Chandler (http://www.osafoundation.org), is a
personal information manager. As such, it needs the ability to run
unstructured full text queries over arbitrarily large repositories of
text.
There are not that many open source text search engines available. Lucene
is considered among the better ones and it is licensed under the Apache
license, both of which make it a very attractive solution.
But it is written in Java.
For various reasons OSAF would prefer not to ship Chandler requiring a
fully fledged JVM, which made a Jython (http://www.jython.org) or JPype
(http://jpype.sourceforge.net) based solution undesirable.
There are several ports of Java Lucene to other languages:
- C++/CLucene (http://sourceforge.net/projects/clucene/)
While 4 times faster than the original java version, CLucene, like most
other ports, is behind, and like most C++ projects, comes with its own
set of bugs.
- Python/Lupy (http://www.divmod.org/Home/Projects/Lupy)
The advantage of a fully native python port is lost by an order of
magnitude worse performance than the original java version.
- .net/dotLucene (http://openlucene.net)
While not behind on the porting curve, dotLucene only swaps one problem
for another, it requires a .net VM.
OSAF wanted something that would be simple to deliver, ideally no more
than a handful of shared libraries. In theory, it should be possible to put
Java Lucene, GNU's java compiler gcj and SWIG together into a native shared
library built as a python extension. It would have to run on Linux, Mac OS X
and Windows, be stable, support threading, and the text indexes would have
to be part of Chandler's repository, sharing transactions.
The PyLucene project was started in December 2003 with a number of
unresolved challenges down the road. It really started as an exploration,
by getting acquainted with several projects:
- Java Lucene (http://jakarta.apache.org/lucene/docs/index.html) is a
decently written 100% Java code base using very little arcane java
features, no GUI, no threading in the core, and putting a lot of effort
in maintaining backwards compatibility with JVMs going back to 1.2.x
releases. The Java Lucene project is being actively developed by a
sizable community of volunteers.
Would Lucene be using java constructs triggering bugs in gcj ?
Would Lucene be using as yet unsupported APIs in libgcj ?
- GNU's gcj compiler (http://gcc.gnu.org/java) is a derivative of their
C++ compiler with a massive Java Runtime library, libgcj, a garbage
collector, boehm-gc, claiming support for most standard JVM APIs up to
1.4.x. Like Java Lucene, gcj is under active development as well.
GNU's gcj compiles java classes from sources or bytecode into a native
shared library, making the code available as if these classes were C++
via the Compiled Native Interface (CNI), an alternative to the Java
Native Interface (JNI).
Would there be proper support on Windows ? Would the bugs be just too
unbearable ? How would Python threads and Java threads integrate ?
- SWIG (http://www.swig.org), is a software development tool that connects
programs written in C and C++ with a variety of high-level programming
languages including Python. SWIG generates a lot of boilerplate code from
source written in a special C-like syntax.
Would SWIG let itself be bent to accomodate something very much looking
like C++ but whose memory is managed rather differenly ?
- Berkeley DB's C library integration with Java is a wrapper around its C
library just like the _bsddb python extension
(http://sourceforge.net/projects/pybsddb).
Would it be possible to pass db objects such as transactions, databases
or environments from python to compiled java such that PyLucene and
Chandler were able to use the same transactions when persisting data ?
During this talk we will explore the solutions we developed for PyLucene,
including:
- Compiler support on various operating systems: Mac OS X, Windows, Linux
- Building PyLucene from sources: getting all the pieces together
- Thread integration issues: the libgcj garbage collector's needs
- Memory management differences: ref counting versus garbage collection
- 'Extending' java classes from python via wrappers: reverse SWIG
- Footprint issues: statically linking libgcj with the python extension
- Code samples: examples of using PyLucene in Python code
We'll also discuss future developments such as:
- How to apply the same techniques to other projects or other languages
(Ruby, Perl, etc)
The PyLucene project was made into a separate project in June
2004. PyLucene is hosted by OSAF and licensed under the MIT license
(http://www.opensource.org/licenses/mit-license.php). PyLucene is under
active development and has a small community of regular users. It has been
deployed in a handful of controlled projects.
PyLucene's homepage is http://pylucene.osafoundation.org.