Ruben Laguna's blog

Aug 25, 2009 - 2 minute read - api bug exception implementation jar java library mac netbeans oome outofmemoryexception parser parsing provider service spi stax windows woodstox wrapper xml

STaX: OutOfMemoryError when parsing big files

Java 6 includes STaX , when I tried to parse a Evernote backup file with it, I got a OOME error.

java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1520)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:486)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2679)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:548)
at

Googling a bit I found a bug report 6536111. It says that this should be fixed in 1.6.0_14. But I tried Sun 1.6.0_16 and no luck. I got the exact same thing.

By the way I get this error in both Windows Vista and Mac OS X 10.5.6.

So I decided to go and use WoodStox instead (which is also a STaX API implementation). I worked like a charm.

At the beginning I though I would need to put the woodstox jars in the endorsed dir (-Djava.endorsed.dirs=“xxx”) but actually it’s not necessary at all.

You just put the woodstox’s jars (stax2-api-3.0.1.jar,woodstox-core-lgpl-4.0.5.jar) in the classpath and that’s it. In my case I was using it in a Netbeans Platform Application (RCP) so I created a Netbeans Library Wrapper with the two jars in it and make my module depend on this new library wrapper.

<class-path-extension>
  <runtime-relative-path>ext/woodstox-core-lgpl-4.0.5.jar</runtime-relative-path>
  <binary-origin>release/modules/ext/woodstox-core-lgpl-4.0.5.jar</binary-origin>
</class-path-extension>

<class-path-extension>
  <runtime-relative-path>ext/stax2-api-3.0.1.jar</runtime-relative-path>
  <binary-origin>release/modules/ext/stax2-api-3.0.1.jar</binary-origin>
</class-path-extension>

The JARs use the Service Provider (SPI) feature of jar files to register themselves as an STaX implementation. No changes in the code, you still use the STaX interface to do the parsing but the WoodStox implementation will be used instead.

...
...
in = new FileInputStream(toAdd);
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
XMLStreamReader parser = factory.createXMLStreamReader(in);
int inHeader=0;
for (int event = parser.next(); event != XMLStreamConstants.END_DOCUMENT; event = parser.next()) {
  switch (event) {
  case XMLStreamConstants.START_ELEMENT:
    if ("title".equals(parser.getLocalName())) {
      inHeader**;
    }
....
....
....