Code972 Coding from the back of a camel

30May/101

Challenges with indexing Hebrew texts (HebMorph, part 1)

Unfortunately, there is no magic trick for correctly indexing and searching Hebrew texts. Semitic languages like Hebrew, Arabic, and Aramaic are the hardest to morphologically analyze and disambiguate, and as a result creating a perfect IR solution for them, if at all possible, requires a lot of research and a very long process of trial and error. Some claim Hebrew is the most complex language of all from an NLP perspective. I don't know other Semitic languages well enough to comment on this, but I do know Hebrew to be complicated enough...

Since someone had to do this lengthy and tiresome work someday, I decided to go forward and do the heavy lifting myself instead of waiting for someone else to pick it up. That, and the fact I needed such a solution for another product I'm working on. This effort - HebMorph - is all about making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. As of this writing, it is still in a design phase, and is available from the github repository.

In a series of posts, I'm going to investigate this subject, and hopefully draw a complete picture. I'll start by explaining Hebrew morphology and how it affects common IR methods. From there, I'll present several possible ways to attack the problem, and finally discuss what exactly HebMorph does and what are its goals and roadmap.

16May/100

BusyObject: Easily get .NET WinForms apps look busy

Imagine you are working on a .NET WinForms application, and it has many small user tasks which require you to disable all input controls in your form. What happens if each of those also has quite a lot of possible exit points? returning and re-enabling the GUI in all of them is quite a pain.

The following code allows you to do this all in just 2 extra lines of code:

15May/100

Welcome post … פוסט ראשון

Welcome to my blog! Please check the About page so you'll know what you should expect finding here...

ברוך הבא לבלוג! כדי לדעת מה לצפות למצוא כאן, אנא בקר בדף האודות שלנו...