Code972 Coding from the back of a camel

28Jun/100

Lucene In Action SE extra free chapter and coupon code for CLucene users

Last week work on the second edition of Lucene In Action has been completed, and the book was sent to print. This book, authored by Michael McCandless, Erik Hatcher and Otis Gospodnetić, is hands down the best guide to Lucene, the high-performance search engine library. Whether you are new to it, or an expert needing a good reference - that's the book for you.

Manning, the publisher, have agreed to release a complete chapter from the book for free. This chapter discusses CLucene, the C++ port of Java Lucene, and is not available from anywhere else.

8Jun/102

Open-source Hebrew information retrieval (HebMorph, part 3)

Indexing Hebrew texts for later retrieval is not a trivial task. Although several solutions exist, I have pointed out that they are not necessarily providing the best results. Either way, there is no freely available solution allowing to index Hebrew even at the very basic level.

HebMorph was started with this in mind. It is a free, open-source effort for making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevance in retrievals. During the work on this project, we will try and come up with different approaches to indexing Hebrew, and provide the tools to perform reliable comparisons between them. This project's ultimate goal is providing various IR libraries with the best Hebrew IR capabilities possible.

6Jun/106

Finding Hebrew lemmas (HebMorph, part 2)

As shown in the previous post, building a Hebrew-aware search engine is not trivial. Several attempts (mainly commercial) were made to deal with that. In this post I'm going to try and draw a complete picture of what they did, and show other routes that may exist. In the next post I'll discuss HebMorph itself.