Building Apache Tika 0.6 fails if the locale is not en_US

I tried to build Apache Tika 0.6 yesterday and I couldn’t build it because the tests failed. The failing tests were testExcelParserFormatting(org.apache.tika.parser.microsoft.ExcelParserTest) testExcelFormats(org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest) and the failure had to to with the fact that the locale was “es_ES” and the numbering format differs ("1.599,99" and not “1,599.99”) $ mvn -version Apache Maven 2.2.0 (r788681; 2009-06-26 15:04:01+0200) Java version: 1.6.0_17 Java home: /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home Default locale: es_ES, platform encoding: MacRoman OS name: "mac os x" version: "10.6.2" arch: "x86_64" Family: "mac"@ I changed the locale temporarily to be able to build ...

March 10, 2010

STaX: OutOfMemoryError when parsing big files

Java 6 includes STaX , when I tried to parse a Evernote backup file with it, I got a OOME error. java.lang.OutOfMemoryError: Java heap space at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1520) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:486) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2679) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:548) at Googling a bit I found a bug report 6536111. It says that this should be fixed in 1.6.0_14. But I tried Sun 1.6.0_16 and no luck. I got the exact same thing. ...

August 25, 2009