Code972 Coding from the back of a camel

28Oct/113

Creating a documentation system – Part 1

A while ago we started revamping the documentation of RavenDB. The work on that resulted in quite a nice documentation system that will be described in general in this post, and more posts will follow as we make more progress and introduce new features to it.

For some time now it was clear that much more organized docs were needed for RavenDB, and it had to be complete too. A lot of content is scattered around the net on blogs and FAQs, and we started gathering it all, arranging it and rewriting the docs almost from scratch. It was also obvious some content is worthy of being available, but is not really a "documentation" content - so we had to figure out what to do with such content too.

Also, one of the reasons the old docs were scattered around in blogs and FAQs is the rapid development process of RavenDB. So to add to all that, we needed to find a way to keep up with that - for example, by having working code samples at all times.

Documentation changes over time as best practices change or new features are added in, and so we needed to take that into account as well - being able to version the docs and see past revisions. Another important factor was community content - we wanted to allow the community to be able to respond, suggest fixes or additions, and to offer new content, even if it is not "documentation" per se.

Wiki sounded a bit too much, and we didn't really want to build something of our own. We played with some ideas for a while, until we had it all figured out.

Don't reinvent the wheel

The most important rule of all - don't waste your time creating a tool that you already have on your belt. In our context those are git and Markdown.

With git, we get easy content versioning, and it gets extremely easy to accept patches from other writers (forks and pull requests). Versioning is simply a matter of branching or tagging - the master HEAD is always the latest docs for the latest version, and whenever we want to mark a version we just create a branch from the working copy, or tag it. Obviously, github plays a huge role here - our documentation system even has an issue tracker, for heaven's sake...

And of course, Markdown is a natural fit. A super-simple text-based markup language, which is very easy to write with. Combined with git it really shines and is able to produce very nice diffs. It never was so easy to track documentation revisions.

Markdown also allows us to export documentation in various formats. On our website we show it as HTML, but it can also be compiled to a PDF book and other e-book formats. We will touch this in detail later on the series.

This is how we got our 3rd party hosted full-featured Wiki for documentation, which is available on github: https://github.com/ravendb/docs.

Editorial notes

Now that we had the basics figured out, we needed to decide on a structure. This actually proved very easy to do. Since we use git, all changes, including moving files around, are recorded and can be tracked. So if we represent each documentation item as a file, and store files under hierarchical folders we are pretty much done.

At the time of this writing we still don't have all the terminology figured out - what is a section, sub-section or a chapter. And at this stage we don't really care about all that. We just write the docs as it appears to make sense, and when it will all be done we should be able to revisit that.

We also created a Knowledge-Base section on the website, where content that is not "documentation" per se can still be published and viewed. All content that is considered out of scope for the actual documentation will be posted there - official articles by Hibernating Rhinos side-by-side with user generated content. The KB is a simple web application and has nothing to do with the documentation system, but in the larger scope - the product - it is important to have, both as means of providing extra content and for interaction with the community.

Code samples QA

To make sure all the code samples are up to date, we created a project with all the sample code used in the docs, and compile it to test the code is valid. With every new official release, we update it there and compile again, to make sure nothing requires changing. This way the code samples are guaranteed to stay up to date and work - what could never be the case with code that is in-lined in the docs themselves.

We added our own Markdown syntax which points to the code file, and write the code snippets within named #region-s, to make it easier to track and identify the code relevant to each page. If you ever wondered what #regions in VS are for, now you know :)

In the documentation, it looks like this:

{CODE region_name@folder/file.cs /}

And an actual code file with such regions can be found here: https://github.com/ravendb/docs/blob/master/code-samples/Intro/BasicOperations.cs

The custom Markdown syntax is parsed when compiling the docs, using a tool we developed that is now part of the documentation repository, before we resolve the markdown itself. The tool will go to the source files directory, locate the file, parse out the requested region, normalize the line spacing and inject it to the markdown source. Only then the source will be compiled and saved to the specified output.

Next...

In the next posts we will look at how the docs are actually compiled, how they are browsed in the website, and how we allow for versioning of docs.

28Jun/110

Some words on HebMorph’s licensing

Without being a lawyer, and trying real hard not to become one, it is not easy to be an author of an open-source project. Apparently it takes quite a lot of thought, and definitely a lot of reading, to make sure the code you release has an appropriate license that specifies your intent correctly. If you don't pay enough attention, you probably are going to end up with a license that is not at all enforcing what you intended it to.

This is what happened to me with HebMorph, and this post is here to clarify everything that needs clarifying, and to explain the reasoning behind the recent license change to HebMorph.

Like I said in an e-mail conversation we recently held in HebMorph's mailing list, this project is all about research and sharing of information.  We WILL reach our goals, some sooner than other, and when we do, the knowledge we gathered will be free for all to learn from and use. However, since we have a very long road ahead of us, I needed to make sure this project can support itself. I spent a lot of time researching options, charting a path, writing code, testing approaches and a lot more, and to be able to continue doing that in large bulks of time (and not occasionally) we needed income.

This is when I decided to charge for any commercial use made with code released under by the HebMorph project. It is actually pretty simple and very fair: I release my work for all to see and use without any charge. If, however, you make profit from my work, I'd like you to support the project. Aiming for quite a small market, relying on donations won't cut it, so I decided to use a license which will allow me to enforce that.

I explicitly stated more than once, and in more than one place, that I'm not after anyone's money. This project grew out of sheer interest, and it will definitely continue to evolve. This is why HebMorph doesn't have a price tag; if you want to use it in a commercial product, contact me and we'll figure something out. An arrangement that is fair for both parties.

Unaware of many legal details, I chose GPLv2 to be HebMorph's license. It seemed promising: any derivative work would require the consuming application to be released under GPLv2 as well, and since most companies would like to avoid that - they would pay for a commercial license. It also was the same license hspell is using, and since some parts of HebMorph are definitely a derivative work of hspell, it required HebMorph to be released under a compatible license, or GPLv2 itself. Problem solved - or at least so I thought.

Following a recent user inquiry, I found out my license of choice was in fact not suitable at all. First, it has many flaws and loopholes making it quite ineffective in enforcing what I wanted it to. It is practically the last license I would choose for any modern software; here's a good read on why.

Secondly, and not less important, any GPLv2 software is incompatible with Lucene/Solr, a software that is released under the Apache software license. Since our main platform is Lucene, we can't afford that.

Now that I realized all this, I've changed HebMorph's license to be AGPLv3. This license is based on GPLv3 (an improvement over GPLv2 on itself), but adds a paragraph that defines "use" in a way that covers also websites and webservices, and by that seals off the infamous GPLv2 loophole. Since AGPLv3 isn't compatible with GPLv2, I had to get an explicit permission from hspell's authors to still be able to use it, and such they did - with the exception of being able to use the hspell files distributed with HebMorph only for search purposes.

Now, you may notice how I frequently used the word "fair" when describing the license selection process. This is because I'm not here to run and seal loopholes, or make sure anyone that is making profit from my work is paying back. I enjoy doing other things, not that. I expect users to be fair; if they make profit from a product that uses HebMorph in one way or another, I expect them to be fair and give back. There probably could be thousands of ways to bypass any license, AGPL included, so I'm making it clear that I release HebMorph under the AGPL and also under the expectation of fairness.  At some point I was actually considering using RPL, but then I decided it is too restrictive and will probably make more problems than it will solve. So I selected AGPLv3, and let me say this again: please act in good faith.

And just to make sure: as far as I'm concerned, using any HebMorph code through Solr is just the same as using it through Lucene. Solr is dynamically linking the jars in what falls under the very definition of "derivative work", and in case that was in doubt, it isn't now. I'm explicitly specifying this, so even if there is a loophole here (which I'm quite certain there is not), it is now under the license definition of "use": if your application uses Solr, and Solr uses HebMorph, your application is effectively using AGPLv3 software and need to be AGPLv3 as well.

Hopefully this clarifies some things about HebMorph, and as always I'd love to hear any thoughts on this.

Due to the unintended conflict of licenses, any previous versions of HebMorph being used with Lucene/Solr has to move to the new license.

As before, OSS projects and non-profit closed source projects are welcome to use HebMorph with no charge, but the latter should contact me in advance to discuss some terms.