Challenges with indexing Hebrew texts (HebMorph, part 1)
Unfortunately, there is no magic trick for correctly indexing and searching Hebrew texts. Semitic languages like Hebrew, Arabic, and Aramaic are the hardest to morphologically analyze and disambiguate, and as a result creating a perfect IR solution for them, if at all possible, requires a lot of research and a very long process of trial and error. Some claim Hebrew is the most complex language of all from an NLP perspective. I don't know other Semitic languages well enough to comment on this, but I do know Hebrew to be complicated enough...
Since someone had to do this lengthy and tiresome work someday, I decided to go forward and do the heavy lifting myself instead of waiting for someone else to pick it up. That, and the fact I needed such a solution for another product I'm working on. This effort - HebMorph - is all about making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. As of this writing, it is still in a design phase, and is available from the github repository.
In a series of posts, I'm going to investigate this subject, and hopefully draw a complete picture. I'll start by explaining Hebrew morphology and how it affects common IR methods. From there, I'll present several possible ways to attack the problem, and finally discuss what exactly HebMorph does and what are its goals and roadmap.
BusyObject: Easily get .NET WinForms apps look busy
Imagine you are working on a .NET WinForms application, and it has many small user tasks which require you to disable all input controls in your form. What happens if each of those also has quite a lot of possible exit points? returning and re-enabling the GUI in all of them is quite a pain.
The following code allows you to do this all in just 2 extra lines of code:
Welcome post … פוסט ראשון
Welcome to my blog! Please check the About page so you'll know what you should expect finding here...