Skip to content

Amara 2.0.0_alpha6 XML bindery leaks memory #19

@kcgen

Description

@kcgen

bindery.parse(...) leaks roughly 200x more memory than the size of the XML document that it's parsing. A 1KB document leaks about 200KB of RAM, which quickly adds up for long running processes performing repeated parsing.

In addition, each parse(...) call adds un-collectable objects to Python's variable reference tracking system, and adds O(1) CPU processing time overhead - perhaps because of the ever-growing reference-tracking load.

How to reproduce:

Python 2.7.6 (default, May  7 2014, 07:34:22) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc 
>>> len(gc.get_objects())
3803
>>> from amara import bindery
>>> len(gc.get_objects())
12113
>>> bindery.parse(open('test.xml'))
<entity at 0xb2ee60: 1 children>
>>> len(gc.get_objects())
12285
>>> bindery.parse(open('test.xml'))
<entity at 0xbed710: 1 children>
>>> len(gc.get_objects())
12832
>>> bindery.parse(open('test.xml'))
<entity at 0xbf6f80: 1 children>
>>> len(gc.get_objects())
12764
>>> bindery.parse(open('test.xml'))
<entity at 0xbf6050: 1 children>
>>> len(gc.get_objects())
13297
>>> bindery.parse(open('test.xml'))
<entity at 0xbf3830: 1 children>
>>> len(gc.get_objects())
13229
>>> bindery.parse(open('test.xml'))
<entity at 0xbf37a0: 1 children>
>>> len(gc.get_objects())
13762
>>> bindery.parse(open('test.xml'))
<entity at 0xcd60e0: 1 children>
>>> len(gc.get_objects())
13707
>>> bindery.parse(open('test.xml'))
<entity at 0xbf6050: 1 children>
>>> len(gc.get_objects())
14240

Compare this to a well behaved parser:

Python 2.7.6 (default, May  7 2014, 07:34:22) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> import xml.etree.cElementTree as ET
>>> len(gc.get_objects())
4289
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a0197b0>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019e10>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019990>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019ed0>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019870>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019a20>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019db0>
3849
>>> ET.parse(open('test.xml')).getroot(); len(gc.get_objects())
<Element '{http://www.w3.org/2005/Atom}feed' at 0x7fa03a019ae0>
3849
>>> 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions