r1 - 25 May 2004 - 10:45:44 - BrianKirschYou are here: OSAF >  Projects Web  >  DevelopmentHome > ServicesWorkingGroup > EmailService > DiskFeedparser

Email from Menno Smits

From:   menno@netbox.biz
Subject:    [Email-SIG] Handling large emails: DiskMessage and DiskFeedParser
Date:    May 24, 2004 6:15:41 PM PDT
To:      email-sig@python.org

Hi all,

FeedParser is great because it doesn't load the entire message into memory during parsing (yes, I realise there are other
 reasons for FeedParser exising too). However, once the message is parsed the attachment bodies are still loaded entirely 
in to memory when Message instances are created and populated. This is a big problem for real world enviroments where 
large messages are possible. All available memory is consumed and the machine grinds to a halt. We see large (40MB+) 
emails all this time and problems start to occur when several of these are being processed simultaneously.

To cope with this problem I've created 2 classes DiskMessage and DiskFeedParser (see http://oss.netboxblue.com).

DiskMessage is a simple subclass of Message that stores message payloads to temporary files instead of RAM. Its API 
is compatible with the standard Message class although to truly avoid loading the entire message in to memory you need 
to use some extra methods. See the source for details.

DiskFeedParser is a hack of the current FeedParser that uses the extra methods of DiskMessage to avoid ever loading
message payloads into memory. If anyone wants to try cleanly subclassing FeedParser for this purpose instead of
just hacking it I'd like to see the results.

Some informal tests of memory usage after parsing a 25MB email (2 large attachments), Python 2.3.3:

                                                          VSZ      RSS
Parser with Message:                       31840    25088
DiskFeedParser with DiskMessage:  12372    6128

Note that these classes haven't been tested extensively but seem to work. Any feedback would be 
greatly appreciated.

Regards,
Menno

-- 
Menno Smits, Senior Development Engineer
NetBox       http://netbox.biz  |  Voice        +61 500 555 357
Oxcoda       http://oxcoda.com  |  Fax          +61 500 555 358

-- BrianKirsch - 25 May 2004

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Open Source Applications Foundation
Except where otherwise noted, this site and its content are licensed by OSAF under an Creative Commons License, Attribution Only 3.0.
See list of page contributors for attributions.