Lucene 2.9.0/3.0.1 memory leaks

If you are experiencing high memory comsumption with Lucene 2.9.0/3.0.1 it could be due… ... to a recently reported and fixed bug in StandardTokenizer where JFlex generated code was expanding a buffer (zzBuffer) and never trimming it down Uploaded with plasq's Skitch! ... to another recently reported and fixed bug where IndexWriter held references to Readers used in your Fields, (and if you have apache tika's reader, those can take up a lot of space) Uploaded with plasq's Skitch!

April 10, 2010

Building Apache Tika 0.6 fails if the locale is not en_US

I tried to build Apache Tika 0.6 yesterday and I couldn’t build it because the tests failed. The failing tests were testExcelParserFormatting(org.apache.tika.parser.microsoft.ExcelParserTest) testExcelFormats(org.apache.tika.parser.microsoft.ooxml.OOXMLParserTest) and the failure had to to with the fact that the locale was “es_ES” and the numbering format differs ("1.599,99" and not “1,599.99”) $ mvn -version Apache Maven 2.2.0 (r788681; 2009-06-26 15:04:01+0200) Java version: 1.6.0_17 Java home: /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home Default locale: es_ES, platform encoding: MacRoman OS name: "mac os x" version: "10.6.2" arch: "x86_64" Family: "mac"@ I changed the locale temporarily to be able to build ...

March 10, 2010