HebMorph at SIGTRS 07/10

Today I gave a talk at SIGTRS on Hebrew search and HebMorph. Attached with this post is the slideshow from the presentation. More info on HebMorph is accessible through the project's page. A PDF with the presentation summary in Hebrew is available as well (6 pages): HebMorph SIGTRS presentation summ...

July 22nd, 2010
English posts, HebMorph, Hebrew posts | עברית, IR

Twitter is using BitTorrent internally for faster deployment

This is what they revealed in a video floating around lately. Instead of sending thousands of git pull requests to each of their deployment servers , Twitter started using the BitTorrent protocol from Python to distribute the binaries in their deployment cycle. They report a drastic speed improvemen...

July 19th, 2010
English posts

Wikipedia offline reader with Hebrew search support

2 min read

BzReader (http://code.google.com/p/bzreader/) is a simple utility which allows browsing dump files downloaded from Wikipedia. Once downloaded, BzReader will go through all pages and articles in the dump file and index their titles. Using BzReader, it is easy to browse and search Wikipedia for specif...

July 18th, 2010
English posts, HebMorph, Lucene.Net

Testing hspell's language coverage using Wikipedia

4 min read

As part of the HebMorph project, I needed to test hspell's dictionary on a large modern corpus. Knowing how many words it can recognize is very important, and below I'll be explaining exactly why. The project, along with usage instructions, is released under the GNU GPL and available from here. The ...

July 13th, 2010
.NET, English posts, HebMorph, hspell

More flexible Hebrew indexing with HebMorph

3 min read

In the past week I've been working on making Hebrew indexing with HebMorph more flexible. Now it is possible to perform different type of searches, and also control the way lemmas are filtered. You can also perform exact searches and morphological searches on one field, without indexing the contents...

July 2nd, 2010
English posts, HebMorph, IR