r2 - 04 Jan 2005 - 23:14:25 - PieterHartsookYou are here: OSAF >  Journal Web  >  DevelopmentHome > PyConProposals2005

Developer Platform

Ted Leung, Katie Capps Parlante Open Source Applications Foundation 543 Howard Street, 5th Floor San Francisco, CA 94105 twl@osafoundation.org, capps@osafoundation.org 45 minute slot

Chandler aspires to be an innovative open source personal information manager (PIM). In addition to being written in Python, Chandler is using the following open source libraries: BerkeleyDB, M2Crypto, Twisted, pyLucene, and wxPython/wxWidgets. Chandler is designed to be an extensible PIM. Chandler's unit of extensiblity is called a parcel, and Chandler's "built-in" functionality is itself composed of parcels. Internally, Chandler is designed as layers of frameworks which provide applications functionality to parcels. Parcels communicate with each other via the data in the Chandler repository.

Several opportunities for open source developers to get involved with the Chandler project:

1. Many developers will be interested in producing their own parcels. These parcels can extend Chandler to deal with new data types (called Kinds) and new user interfaces to that data. These parcels can leverage any data type in the Chandler repository, whether the data type is supplied by the base Chandler application or by another parcel.

2. Some developers will be interested in extending/improving/bugfixing the core appliation frameworks of the Chandler system.

The goal of our presentation and paper is to allow someone to begin developing a parcel that extends the Chandler user interface. We plan to cover the following topics using an existing Chandler parcel as a concrete example:

  • Introduction to the basic Chandler User Interface elements
  • General overview of Chandler architecture
    • Repository and data model
    • Services (e-mail, Journal.WebDAV, etc)
    • Chandler Presentation/Interaction framework (CPIA)

  • How to organize a typical extension parcel
  • How to extend the content model schema with new Kinds
  • How to add a new detail view for a new Kind
  • How to extend the sidebar with new Kinds and new collections.
  • How to add a new type of summary view.
  • How to extend Chandler with new menus and toolbar buttons
  • How to add background tasks

PyLucene

Proposal for PyCon 2005 45mn talk about PyLucene
================================================

Title: Pulling Java Lucene into Python: PyLucene

Topic: Python Integration

         As we needed an open source text search engine library for our
         Python based project, we made the following bet: what if we pulled
         together Java Lucene, GNU's gcj java compiler and SWIG to build a
         python extension ? In this presentation we'd like to talk about the
         challenges we met since this project was started a year ago.

 OSAF's flagship project, Chandler (http://www.osafoundation.org), is a
 personal information manager. As such, it needs the ability to run
 unstructured full text queries over arbitrarily large repositories of
 text.

 There are not that many open source text search engines available. Lucene
 is considered among the better ones and it is licensed under the Apache
 license, both of which make it a very attractive solution.

 But it is written in Java.

 For various reasons OSAF would prefer not to ship Chandler requiring a
 fully fledged JVM, which made a Jython (http://www.jython.org) or JPype
 (http://jpype.sourceforge.net) based solution undesirable.

 There are several ports of Java Lucene to other languages:

  - C++/CLucene (http://sourceforge.net/projects/clucene/)
    While 4 times faster than the original java version, CLucene, like most
    other ports, is behind, and like most C++ projects, comes with its own
    set of bugs.
  - Python/Lupy (http://www.divmod.org/Home/Projects/Lupy)
    The advantage of a fully native python port is lost by an order of
    magnitude worse performance than the original java version.
  - .net/dotLucene (http://openlucene.net)
    While not behind on the porting curve, dotLucene only swaps one problem
    for another, it requires a .net VM.

 OSAF wanted something that would be simple to deliver, ideally no more
 than a handful of shared libraries. In theory, it should be possible to put
 Java Lucene, GNU's java compiler gcj and SWIG together into a native shared
 library built as a python extension. It would have to run on Linux, Mac OS X
 and Windows, be stable, support threading, and the text indexes would have
 to be part of Chandler's repository, sharing transactions.

 The PyLucene project was started in December 2003 with a number of
 unresolved challenges down the road. It really started as an exploration,
 by getting acquainted with several projects:

  - Java Lucene (http://jakarta.apache.org/lucene/docs/index.html) is a
    decently written 100% Java code base using very little arcane java
    features, no GUI, no threading in the core, and putting a lot of effort
    in maintaining backwards compatibility with JVMs going back to 1.2.x
    releases. The Java Lucene project is being actively developed by a
    sizable community of volunteers.

    Would Lucene be using java constructs triggering bugs in gcj ?
    Would Lucene be using as yet unsupported APIs in libgcj ?

  - GNU's gcj compiler (http://gcc.gnu.org/java) is a derivative of their
    C++ compiler with a massive Java Runtime library, libgcj, a garbage
    collector, boehm-gc, claiming support for most standard JVM APIs up to
    1.4.x. Like Java Lucene, gcj is under active development as well.
    GNU's gcj compiles java classes from sources or bytecode into a native
    shared library, making the code available as if these classes were C++
    via the Compiled Native Interface (CNI), an alternative to the Java
    Native Interface (JNI).

    Would there be proper support on Windows ? Would the bugs be just too
    unbearable ? How would Python threads and Java threads integrate ?

  - SWIG (http://www.swig.org), is a software development tool that connects
    programs written in C and C++ with a variety of high-level programming
    languages including Python. SWIG generates a lot of boilerplate code from
    source written in a special C-like syntax.

    Would SWIG let itself be bent to accomodate something very much looking
    like C++ but whose memory is managed rather differenly ?

  - Berkeley DB's C library integration with Java is a wrapper around its C
    library just like the _bsddb python extension
    (http://sourceforge.net/projects/pybsddb).

    Would it be possible to pass db objects such as transactions, databases
    or environments from python to compiled java such that PyLucene and
    Chandler were able to use the same transactions when persisting data ?

 During this talk we will explore the solutions we developed for PyLucene,
 including:

  - Compiler support on various operating systems: Mac OS X, Windows, Linux
  - Building PyLucene from sources: getting all the pieces together
  - Thread integration issues: the libgcj garbage collector's needs
  - Memory management differences: ref counting versus garbage collection
  - 'Extending' java classes from python via wrappers: reverse SWIG
  - Footprint issues: statically linking libgcj with the python extension
  - Code samples: examples of using PyLucene in Python code

 We'll also discuss future developments such as:

  - How to apply the same techniques to other projects or other languages
    (Ruby, Perl, etc)


 The PyLucene project was made into a separate project in June
 2004. PyLucene is hosted by OSAF and licensed under the MIT license
 (http://www.opensource.org/licenses/mit-license.php).  PyLucene is under
 active development and has a small community of regular users. It has been
 deployed in a handful of controlled projects.

 PyLucene's homepage is http://pylucene.osafoundation.org. 
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.