Code972 Coding from the back of a camel

30Jan/120

RavenDB London tour

This February I'll be visiting London, consulting on RavenDB and giving our course at Skills Matter. There is also a free session, in which I'll discuss the RavenDB indexing system and how to make the most out of it.

More info on our RavenDB London course (Feb 28-29): http://skillsmatter.com/course/open-source-dot-net/ayende-rahiens-ravendb-workshop

The "In The Brain" session (Feb 28th, 18:30): http://skillsmatter.com/event/open-source-dot-net/ravendb

I might have a free evening there, and I'll be happy to discuss RavenDB or any other dev topic over a beer. Just saying.

Tagged as: No Comments
24Dec/110

חיפוש עברי בספריה הלאומית

"כל מאגרי הספריה הלאומית, עכשיו באינטרנט", זעקו הכותרות. כחובב טקסטים, הלכתי לראות על מה מדובר.

באתר הספריה (http://web.nli.org.il) יש גישה לקטלוג ולארכיונים שונים, כאשר בראש האתר עומדת תיבת טקסט לחיפוש חופשי. כמובן שזה הדבר הראשון שניסיתי באתר...

ובכן, עושה רושם שבעיית החיפוש העברי אכן היתה ידועה ונלקחה בחשבון בבניית האתר. נראה שאיזו שהיא תשומת לב אכן ניתנה לטיפול מורפולוגי כלשהו, אך חבל שהתוצאות רחוקות מלהיות טובות, ואפילו נכונות.

כמה דוגמאות מייצגות ומסקנותיהן (בקצרה) בצידן:

  1. חיפוש עבור "רבין" מביא תוצאות לא רלוונטיות כלל ב-6 התוצאות הראשונות (עם המילה "רביניו" מודגשת). הקלטת שמע מאת עוזר רבין מופיעה שביעית, ראשונה מבין התוצאות עבור "רבין". זהו recall גרוע במיוחד. הסיבה לכך היא מתן משקל זהה לצורות מדויקות וצורות החשודות כדומות, וכדאי לשים לב שמדובר על מילה בעלת הטיות אפשריות מעטות מאד.
  2. אותיות מש"ה וכל"ב כלל לא מטופלות כראוי - חיפוש עבור "הלב" לא מחזיר תוצאות בהן מופיעה המילה "לב", ומאוחזרות רק הטיות של המילה "לב" עם התחילית ה'. זו אינה הדרך הנכונה הנכונה לבצע זאת - נרצה לדרג אחזורים מדוייקים גבוה יותר, אך לא לאבד אחזורים רלוונטיים שנכתבו במקור ללא אותיות מש"ה וכל"ב.
  3. גרשיים. לא נתמכים. בכלל. חיפוש עבור צה"ל, רמב"ם, רמב"ן לא מניב אף תוצאה (אבל צהל, רמבם כן).
  4. כתיב מלא / חסר - לא נתמך כלל. חיפושים עבור אמא / אימא, חנוכיה / חנוכייה, ספריה / ספרייה ועוד מחזירים תוצאות שונות לחלוטין.

כל הדוגמאות הנ"ל גורמות לי להאמין שמדובר על query expansion מסוג כלשהו, ובכל אופן ברור שמדובר על מנוע חיפוש קליל ביותר עבור מאגר הספרים הלאומי. החיפוש אינו ממצה, ובעל precision & recall נמוכים ביותר. בכמה הרצאות שנתתי בנושא כבר הראיתי דוגמאות לכך באתרים כמו ווינט, ויקיפדיה העברית ותפוז, אך דווקא מהספריה הלאומית ציפיתי ליותר...

פרוייקט HebMorph, עליו ניתן לקרוא הרבה גם באתר זה, נועד בדיוק למטרה זו, והוא בקוד פתוח (עם אופציה לשימוש מסחרי). בשימוש קצר ב-demo החי ניתן להתרשם מכך שהמנוע כבר מטפל גם בנקודות שאוזכרו...

Tagged as: , No Comments
6Dec/111

RavenDB Caching done right (EventsZilla part II)

In the previous post we created the basics for an events publishing application, and discussed the modeling aspect of things.

I put some more work into the app, and now it actually works and looks pretty nice. Queries and loads are in place for the front-end, so it is time to visit one key feature of RavenDB - Caching.

Basic caching

The RavenDB Client API provides automatic out-of-the-box caching for all read operations. Every data request sent to the server is being remembered by the document store object, so subsequent  read operation that are detected as identical can return immediately.

However, it is important to beware of common pitfalls which may cause you not to take advantage of this handy feature. While there's no real way to mess up with simple Load operations, it is very easy to do that when querying.

For example, the most common query in an application like EventsZilla is to get events starting before or after a certain point in time, usually DateTimeOffset.Now. However, a query like this is guaranteed to never use the cache, since it is virtually different every time it is called.

In EventsZilla we can fix this relatively easily, by lowering the DateTimeOffset resolution when querying. Another approach will be to round up (or down) the value. The actual resolution or rounding approach we use will determine how much of caching this query will take advantage of.

Relevant code can be found here.

Aggressive caching

Basic caching is very effective, requires no action from the user's end to work, and is a great feature for automatically improving your applications performance. However, a server query is still issued with every read operation to make sure the cache never goes stale. The actual benefit with basic caching is with getting back a quick response of a thin 304 (HTTP for "I haven't changed") instead of a complete 200 response with all the requested data.

At times, we load an object - or perform a query - that we really don't care if it changes for a certain period of time, or we just don't expect it to. If we choose to, we can tell RavenDB not to query the database at all if it has a cached response that is not older than a given point in time.

This feature is called an Aggressive Caching, being aggressive in the sense of not peeking outside the cache at all. Unlike basic caching it is an opt-in feature.

In EventsZilla, this is exactly the case with a website-wide config object. We don't expect it to change a lot, and when it changes, we can bear a certain amount of time until the changes are noticeable in our website.

All we need to do to make it happen is load that object within a context of an AggressiveCache, and the RavenDB Client API will take care of the rest for us.

Using Aggressive Caching is as simple as this:

using (RavenSession.Advanced.DocumentStore.AggressivelyCacheFor(TimeSpan.FromMinutes(30)))
{
	var siteConfig = RavenSession.Load<SiteConfig>(SiteConfig.ConfigName);
}

More on caching

Is in the second part of the excellent RavenOverflow video, available here.

29Nov/110

EventsZilla: RavenDB modeling walkthrough

I needed a simple event publishing application. I also felt like doing another RavenDB sample app and a RavenDB post on it. This is how EventsZilla came to life.

EventsZilla (full sources here: https://github.com/synhershko/eventszilla) is meant to be a simple web application to announce events along with a schedule, which is also capable of viewing past and future events. People should be able to register to an event without registering with the website, and also view slides and other content when it becomes available post-event.

This post is being written during development, describing each stage and the considerations leading to the next. As such, the code I link to does not necessarily work, although it should. I will probably have some fixes and amendments made to the code after publishing this post.

Initial modeling

When we speak of an events publishing application, what are we looking at? The most basic items are an Event with means of registration, and a list of sessions for each Event. Each event should have a registration window, and a venue in which it takes place, and obviously title and description.

For each session in an event we want to have a Presenter (possibly more than one), a title, a brief (aka abstract), and times in which each session starts and ends. We should note the start and end time of the event are going to be derived directly from the first and last sessions of the event. For now we call a session a "Schedule slot".

Unlike a relational model, with RavenDB we can sketch the entire thing as one class and just use it. There is one exception though - at this stage we already know venues and presenters might be showing several times in different events (maybe even the same presenter in multiple sessions in the same event), so we don't want to store them directly under the event, but rather link to them by storing their IDs only. They could be efficiently retrieved using the Includes feature.

We end up with this Event class:

	public class Event
	{
		public Event()
		{
			Schedule = new List<ScheduleSlot>();
		}

		public int Id { get; set; }
		public string Title { get; set; }
		public string Slug { get; set; }
		public string Description { get; set; } // markdown content

		public string VenueId { get; set; }

		public DateTimeOffset CreatedAt { get; set; }

		public DateTimeOffset RegistrationOpens { get; set; }
		public DateTimeOffset RegistrationCloses { get; set; }
		public int AvailableSeats { get; set; }

		public class ScheduleSlot
		{
			public List<string> PresenterIds { get; set; } // list of person IDs
			public string Title { get; set; }
			public string Brief { get; set; } // markdown
			public DateTimeOffset StartingAt { get; set; }
			public DateTimeOffset EndingAt { get; set; }
		}
		public List<ScheduleSlot> Schedule { get; set; }

		public DateTimeOffset StartsAt
		{
			get
			{
				var firstSession = Schedule.OrderBy(x => x.StartingAt).FirstOrDefault();
				return firstSession == null ? DateTimeOffset.MinValue : firstSession.StartingAt;
			}
		}

		public DateTimeOffset EndsAt
		{
			get
			{
				var lastSession = Schedule.OrderByDescending(x => x.EndingAt).FirstOrDefault();
				return lastSession == null ? DateTimeOffset.MaxValue : lastSession.EndingAt;
			}
		}
	}

Since an event schedule has no meaning outside the scope of an event, it is best persisted there as well. It also means the whole schedule will be loaded with the event with each Load or Query operation this event will be part of. At this stage we are fine with that.

The StartsAt and EndsAt properties of the Event are persisted this way to take some pressure off the indexes we are going to create, so business logic will reside in the actual domain types instead of in the indexes as much as possible.

The Venue and Presenter classes are quite trivial ones, so won't be shown here.

The actual code for this phase is in this github commit.

Event registration

Registering to an event is quite a common operation in our system, and in a crowded website multiple registrations to the same event can be made at the same time.

Like an event schedule, an event registration has no meaning at all outside the scope of the event itself, at least as long as we don't try and keep track of attendees (which we don't). Unlike the schedule, the attendees list is going to change quite a lot, and often at the same time. For this reason, keeping this list within the Event object itself won't make sense, as it will require us to start thinking about conflict resolution, when 2 or more people try to register for the same event on the same time.

Another reason not to save registrations within the Event itself is we don't really care about it when we load an event, and that list can grow quite big for certain events. We want to make sure the Event object only holds data we are going to access frequently at that context; the registrants list is not that type of data.

To keep the registrants list separate, while still making sure we don't need to worry about possible conflicts, I created a simple EventRegistration class which will hold of all that data. We persist it exactly that way, and whenever we need to know the amount of people who registered to the event, we query the DB for a count of registrations for that event. That query is using a simple static index we defined upfront.

	public class EventRegistration
	{
		public string EventId { get; set; }
		public string RegistrantEmail { get; set; }
		public string RegistrantName { get; set; }
		public DateTimeOffset RegisteredAt { get; set; }
	}

Actual code for this phase is in this commit.

Available seats

Still in the context of registrations, it is important to note that by design we are not necessarily blocking registration for an event the second it is full. The reason for this is that we are getting the number of people that registered to the event through an index query, and while RavenDB's indexing process is quite fast, it is possible on a busy websites this query will return a number that is not entirely up to date.

In EventsZilla, this is a design decision we made, not to think like a computer. Your PC knows to respect a hard limit, but in life, we hardly really do that. So if your event has 100 seats, wouldn't you be able to squeeze 10-15 more? and are you really that sure all of the original 100 registrants will indeed show up?

For that reason we don't care if the count we got back from our query is not the actual count at the point of time where we issued the query. This is quite a common practice with Eventual Consistency - we don't try hard to respect hard limits that don't really exist anyhow, and wonderful things happen.

RavenDB can tell us the count we got is stale. Waiting for non-stale results is possible, but in production is really not recommended, as it is going to significantly slow down your system. In some cases it can also result in an infinite wait. So don't do that.

If we really needed to keep to a hard limit, we could add a RegistrantsCount property to our Event class, and increment it with every registration. It requires a bit more work to make sure concurrent writes are detected, and one is delayed, so no incorrect counts happen, but it will ensure we can know the exact count at all times, since we could then retrieve that event using a Load operation, which is ACID.

Next in line

As time allows, I will explore our possibilities for times when we want to enable different tracks in the event, and to better support events with only one session. Also in my to do list is letting each schedule slot have materials such as slides, video, code samples etc. Stay tuned.

7Nov/110

The Oredev “Lost Session”

This week I'm in Malmo, Sweden for Oredev - looking forward for a great conference.

Wednesday evening, about an hour after the last session for the day, I will be giving a RavenDB session in KAN's offices. There are a few seats available - more details and registration here: http://thelostsession.kan.se/.

If you live nearby, or attending the conference, we would love to see you there. The evening is free, and there will be food and beers.

28Oct/113

Creating a documentation system – Part 1

A while ago we started revamping the documentation of RavenDB. The work on that resulted in quite a nice documentation system that will be described in general in this post, and more posts will follow as we make more progress and introduce new features to it.

For some time now it was clear that much more organized docs were needed for RavenDB, and it had to be complete too. A lot of content is scattered around the net on blogs and FAQs, and we started gathering it all, arranging it and rewriting the docs almost from scratch. It was also obvious some content is worthy of being available, but is not really a "documentation" content - so we had to figure out what to do with such content too.

Also, one of the reasons the old docs were scattered around in blogs and FAQs is the rapid development process of RavenDB. So to add to all that, we needed to find a way to keep up with that - for example, by having working code samples at all times.

Documentation changes over time as best practices change or new features are added in, and so we needed to take that into account as well - being able to version the docs and see past revisions. Another important factor was community content - we wanted to allow the community to be able to respond, suggest fixes or additions, and to offer new content, even if it is not "documentation" per se.

Wiki sounded a bit too much, and we didn't really want to build something of our own. We played with some ideas for a while, until we had it all figured out.

Don't reinvent the wheel

The most important rule of all - don't waste your time creating a tool that you already have on your belt. In our context those are git and Markdown.

With git, we get easy content versioning, and it gets extremely easy to accept patches from other writers (forks and pull requests). Versioning is simply a matter of branching or tagging - the master HEAD is always the latest docs for the latest version, and whenever we want to mark a version we just create a branch from the working copy, or tag it. Obviously, github plays a huge role here - our documentation system even has an issue tracker, for heaven's sake...

And of course, Markdown is a natural fit. A super-simple text-based markup language, which is very easy to write with. Combined with git it really shines and is able to produce very nice diffs. It never was so easy to track documentation revisions.

Markdown also allows us to export documentation in various formats. On our website we show it as HTML, but it can also be compiled to a PDF book and other e-book formats. We will touch this in detail later on the series.

This is how we got our 3rd party hosted full-featured Wiki for documentation, which is available on github: https://github.com/ravendb/docs.

Editorial notes

Now that we had the basics figured out, we needed to decide on a structure. This actually proved very easy to do. Since we use git, all changes, including moving files around, are recorded and can be tracked. So if we represent each documentation item as a file, and store files under hierarchical folders we are pretty much done.

At the time of this writing we still don't have all the terminology figured out - what is a section, sub-section or a chapter. And at this stage we don't really care about all that. We just write the docs as it appears to make sense, and when it will all be done we should be able to revisit that.

We also created a Knowledge-Base section on the website, where content that is not "documentation" per se can still be published and viewed. All content that is considered out of scope for the actual documentation will be posted there - official articles by Hibernating Rhinos side-by-side with user generated content. The KB is a simple web application and has nothing to do with the documentation system, but in the larger scope - the product - it is important to have, both as means of providing extra content and for interaction with the community.

Code samples QA

To make sure all the code samples are up to date, we created a project with all the sample code used in the docs, and compile it to test the code is valid. With every new official release, we update it there and compile again, to make sure nothing requires changing. This way the code samples are guaranteed to stay up to date and work - what could never be the case with code that is in-lined in the docs themselves.

We added our own Markdown syntax which points to the code file, and write the code snippets within named #region-s, to make it easier to track and identify the code relevant to each page. If you ever wondered what #regions in VS are for, now you know :)

In the documentation, it looks like this:

{CODE region_name@folder/file.cs /}

And an actual code file with such regions can be found here: https://github.com/ravendb/docs/blob/master/code-samples/Intro/BasicOperations.cs

The custom Markdown syntax is parsed when compiling the docs, using a tool we developed that is now part of the documentation repository, before we resolve the markdown itself. The tool will go to the source files directory, locate the file, parse out the requested region, normalize the line spacing and inject it to the markdown source. Only then the source will be compiled and saved to the specified output.

Next...

In the next posts we will look at how the docs are actually compiled, how they are browsed in the website, and how we allow for versioning of docs.

16Oct/110

Whose bug is it anyway? Google vs Microsoft

Consider the following code (.NET):

public static IEnumerable<SyndicationItem> ReadFeed()
{
	IEnumerable<SyndicationItem> ret;

	using (var reader = XmlReader.Create(ListAtomUrl))
	{
		var feed = SyndicationFeed.Load(reader);

		if (feed == null)
			return null;

		ret = feed.Items;

		reader.Close();
	}

	return ret;
}

This is a a simple code to read an ATOM feed, using the relatively new .NET syndication API. When executing it on a Google groups ATOM feed (http://groups.google.com/group/ravendb/feed/atom_v1_0_topics.xml, for example), it would fail miserably with this error:

Error in line 10 position 22.

An error was encountered when parsing a DateTime value in the XML.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.Xml.XmlException: Error in line 10 position 22. An error was encountered when parsing a DateTime value in the XML.

The reason for this error is this XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <updated>-0-0T::Z</updated>
  <generator uri="http://groups.google.com" version="1.99">Google Groups</generator>
  <entry>
  <author>

Notice the "updated" tag. This only happens for the "topics" feed, not the "new messages" feed Google provides for each group.

So whose bug is it? My bet is on Google. Skimming briefly over the ATOM RFC, I could find no mention of a "n/a" value for the "updated" field, so I can't tell if its legit, but this value just doesn't seem right.

However, Microsoft is at fault here too by not providing a way to tolerate those kind of errors. After all, the syndication API is meant to be used with _external_ services, such that the developer would not have access too, and this API renders useless on the slightest bug a feed provider has. Fact is, no other reader I use had problems reading that feed.

Tagged as: No Comments
21Sep/110

Orev: The Apache OpenRelevance Viewer

It has been quite a some time since I said I'll be working on this, as I got caught on other pressing matters and had to drop it for a while. But it is all for the best. The technology I used for this new version is just a perfect fit for this application, and it wasn't available then. I'll be addressing the technical aspects later in this post and also in some follow-up posts.

My first interest in the OpenRelevance project, and one of the main reasons I created Orev, was the HebMorph project. Using Orev, I'm hoping to be able to create an environment where tools for Hebrew IR can be tested and compared, to produce the ultimate Hebrew analyzer, for Lucene and other libraries as well.

Before anything else, the complete source code is available at https://github.com/synhershko/Orev.

I have a hosted version too which I will publish a link to soon, once I get some things sorted out and some feedback from other people who were involved in this project.

What is this?

The OpenRelevance project is an Apache project, aimed at making materials for doing relevance testing for information retrieval (IR), Machine Learning and Natural Language Processing (NLP). Think TREC, but open-source.

These materials require a lot of managing work and many human hours to be put into collecting corpora and topics, and then judging them. Without going into too many details here about the actual process, it essentially means crowd-sourcing a lot of work, and that is assuming the OpenRelevance project had the proper tools to offer the people recruited for the work.

Having no such tool, the Viewer - Orev - is meant for being exactly that, and so to minimize the overhead required from both the project managers and the people who will be doing the actual work. By providing nice and easy facilities to add new Topics and Corpora, and to feed documents into a corpus, it will make it very easy to manage the surrounding infrastructure. And with a nice web UI to be judging documents with, the work of the recruits is going to be very easy to grok.

More technical details

Orev is multi-lingual from the ground up, and is heavily user-based. Every user can view available topics and corpora, and make judgments based on the languages he speaks.

Managers can add new topics, create new corpora and feed those with documents. Documents can be added to a corpus, or updated, at a later time, too.

We will probably add the ability to enable users to send topics in as well and so on.

Even more technical details

When I started to work on this I was using NHibernate and spent some time on designing a DB schema, fighting with ASP.NET MVC and all that. Now that MVC 3 is out, and RavenDB is rocking worlds, it was a matter of a few hours to get this all started again from scratch. Using a schema-less DB really made this possible to do in a minimum number of hours, excluding some dilemmas and frustrations which I will be blogging about soon.

In the original design I intended on loading corpus documents from external sources, or store them on the file-system. Since now it is using RavenDB, which is a document based database, storing the documents in the DB itself now actually makes sense. This is how we can also offer later updating of a corpus with new documents, or patching old documents.

What's next

We need to run a lot of tests, get a lot of feedback and improve accordingly. The first step is obviously gathering content and raising interest, so if you find this post / project interesting - please spread the word.

Orev is currently using the default ASP.NET MVC theme. If there's any HTML5/CSS designer and magic worker who can take up the task to recreate it to be more inviting and easier to work with - it is something we can definitely use.

I have enabled the github bug tracker in the Orev source repository. Please use it for reporting bugs or asking for features.

When the dust sets down and actual judging will commence on a regular basis, we will start working on code to output stats and statistical computations, in preparations for the original cause of the OpenRelevance project - to measure performance of IR software (+ NLP + ML, of course), and to be able to produce bleeding edge analyzers for various languages.

30Jun/112

Practical Hebrew search – Open2011 presentation

Attached with this post is the presentation I gave today at Open2011 in Tel-Aviv.

The sample app can be found here: http://hebmorph.code972.com/. It is also going to be HebMorph's home in a few weeks when I'll be done generating all the necessary content.

As promised, I will be posting more details on some interested findings on Hebrew search, and comparisons with Google search. I want to have a bit more comprehensive posts about that, so it will be up in a few weeks time.



28Jun/110

Some words on HebMorph’s licensing

Without being a lawyer, and trying real hard not to become one, it is not easy to be an author of an open-source project. Apparently it takes quite a lot of thought, and definitely a lot of reading, to make sure the code you release has an appropriate license that specifies your intent correctly. If you don't pay enough attention, you probably are going to end up with a license that is not at all enforcing what you intended it to.

This is what happened to me with HebMorph, and this post is here to clarify everything that needs clarifying, and to explain the reasoning behind the recent license change to HebMorph.

Like I said in an e-mail conversation we recently held in HebMorph's mailing list, this project is all about research and sharing of information.  We WILL reach our goals, some sooner than other, and when we do, the knowledge we gathered will be free for all to learn from and use. However, since we have a very long road ahead of us, I needed to make sure this project can support itself. I spent a lot of time researching options, charting a path, writing code, testing approaches and a lot more, and to be able to continue doing that in large bulks of time (and not occasionally) we needed income.

This is when I decided to charge for any commercial use made with code released under by the HebMorph project. It is actually pretty simple and very fair: I release my work for all to see and use without any charge. If, however, you make profit from my work, I'd like you to support the project. Aiming for quite a small market, relying on donations won't cut it, so I decided to use a license which will allow me to enforce that.

I explicitly stated more than once, and in more than one place, that I'm not after anyone's money. This project grew out of sheer interest, and it will definitely continue to evolve. This is why HebMorph doesn't have a price tag; if you want to use it in a commercial product, contact me and we'll figure something out. An arrangement that is fair for both parties.

Unaware of many legal details, I chose GPLv2 to be HebMorph's license. It seemed promising: any derivative work would require the consuming application to be released under GPLv2 as well, and since most companies would like to avoid that - they would pay for a commercial license. It also was the same license hspell is using, and since some parts of HebMorph are definitely a derivative work of hspell, it required HebMorph to be released under a compatible license, or GPLv2 itself. Problem solved - or at least so I thought.

Following a recent user inquiry, I found out my license of choice was in fact not suitable at all. First, it has many flaws and loopholes making it quite ineffective in enforcing what I wanted it to. It is practically the last license I would choose for any modern software; here's a good read on why.

Secondly, and not less important, any GPLv2 software is incompatible with Lucene/Solr, a software that is released under the Apache software license. Since our main platform is Lucene, we can't afford that.

Now that I realized all this, I've changed HebMorph's license to be AGPLv3. This license is based on GPLv3 (an improvement over GPLv2 on itself), but adds a paragraph that defines "use" in a way that covers also websites and webservices, and by that seals off the infamous GPLv2 loophole. Since AGPLv3 isn't compatible with GPLv2, I had to get an explicit permission from hspell's authors to still be able to use it, and such they did - with the exception of being able to use the hspell files distributed with HebMorph only for search purposes.

Now, you may notice how I frequently used the word "fair" when describing the license selection process. This is because I'm not here to run and seal loopholes, or make sure anyone that is making profit from my work is paying back. I enjoy doing other things, not that. I expect users to be fair; if they make profit from a product that uses HebMorph in one way or another, I expect them to be fair and give back. There probably could be thousands of ways to bypass any license, AGPL included, so I'm making it clear that I release HebMorph under the AGPL and also under the expectation of fairness.  At some point I was actually considering using RPL, but then I decided it is too restrictive and will probably make more problems than it will solve. So I selected AGPLv3, and let me say this again: please act in good faith.

And just to make sure: as far as I'm concerned, using any HebMorph code through Solr is just the same as using it through Lucene. Solr is dynamically linking the jars in what falls under the very definition of "derivative work", and in case that was in doubt, it isn't now. I'm explicitly specifying this, so even if there is a loophole here (which I'm quite certain there is not), it is now under the license definition of "use": if your application uses Solr, and Solr uses HebMorph, your application is effectively using AGPLv3 software and need to be AGPLv3 as well.

Hopefully this clarifies some things about HebMorph, and as always I'd love to hear any thoughts on this.

Due to the unintended conflict of licenses, any previous versions of HebMorph being used with Lucene/Solr has to move to the new license.

As before, OSS projects and non-profit closed source projects are welcome to use HebMorph with no charge, but the latter should contact me in advance to discuss some terms.