Code972 Coding from the back of a camel

30Jun/112

Practical Hebrew search – Open2011 presentation

Attached with this post is the presentation I gave today at Open2011 in Tel-Aviv.

The sample app can be found here: http://hebmorph.code972.com/. It is also going to be HebMorph's home in a few weeks when I'll be done generating all the necessary content.

As promised, I will be posting more details on some interested findings on Hebrew search, and comparisons with Google search. I want to have a bit more comprehensive posts about that, so it will be up in a few weeks time.



28Jun/110

Some words on HebMorph’s licensing

Without being a lawyer, and trying real hard not to become one, it is not easy to be an author of an open-source project. Apparently it takes quite a lot of thought, and definitely a lot of reading, to make sure the code you release has an appropriate license that specifies your intent correctly. If you don't pay enough attention, you probably are going to end up with a license that is not at all enforcing what you intended it to.

This is what happened to me with HebMorph, and this post is here to clarify everything that needs clarifying, and to explain the reasoning behind the recent license change to HebMorph.

Like I said in an e-mail conversation we recently held in HebMorph's mailing list, this project is all about research and sharing of information.  We WILL reach our goals, some sooner than other, and when we do, the knowledge we gathered will be free for all to learn from and use. However, since we have a very long road ahead of us, I needed to make sure this project can support itself. I spent a lot of time researching options, charting a path, writing code, testing approaches and a lot more, and to be able to continue doing that in large bulks of time (and not occasionally) we needed income.

This is when I decided to charge for any commercial use made with code released under by the HebMorph project. It is actually pretty simple and very fair: I release my work for all to see and use without any charge. If, however, you make profit from my work, I'd like you to support the project. Aiming for quite a small market, relying on donations won't cut it, so I decided to use a license which will allow me to enforce that.

I explicitly stated more than once, and in more than one place, that I'm not after anyone's money. This project grew out of sheer interest, and it will definitely continue to evolve. This is why HebMorph doesn't have a price tag; if you want to use it in a commercial product, contact me and we'll figure something out. An arrangement that is fair for both parties.

Unaware of many legal details, I chose GPLv2 to be HebMorph's license. It seemed promising: any derivative work would require the consuming application to be released under GPLv2 as well, and since most companies would like to avoid that - they would pay for a commercial license. It also was the same license hspell is using, and since some parts of HebMorph are definitely a derivative work of hspell, it required HebMorph to be released under a compatible license, or GPLv2 itself. Problem solved - or at least so I thought.

Following a recent user inquiry, I found out my license of choice was in fact not suitable at all. First, it has many flaws and loopholes making it quite ineffective in enforcing what I wanted it to. It is practically the last license I would choose for any modern software; here's a good read on why.

Secondly, and not less important, any GPLv2 software is incompatible with Lucene/Solr, a software that is released under the Apache software license. Since our main platform is Lucene, we can't afford that.

Now that I realized all this, I've changed HebMorph's license to be AGPLv3. This license is based on GPLv3 (an improvement over GPLv2 on itself), but adds a paragraph that defines "use" in a way that covers also websites and webservices, and by that seals off the infamous GPLv2 loophole. Since AGPLv3 isn't compatible with GPLv2, I had to get an explicit permission from hspell's authors to still be able to use it, and such they did - with the exception of being able to use the hspell files distributed with HebMorph only for search purposes.

Now, you may notice how I frequently used the word "fair" when describing the license selection process. This is because I'm not here to run and seal loopholes, or make sure anyone that is making profit from my work is paying back. I enjoy doing other things, not that. I expect users to be fair; if they make profit from a product that uses HebMorph in one way or another, I expect them to be fair and give back. There probably could be thousands of ways to bypass any license, AGPL included, so I'm making it clear that I release HebMorph under the AGPL and also under the expectation of fairness.  At some point I was actually considering using RPL, but then I decided it is too restrictive and will probably make more problems than it will solve. So I selected AGPLv3, and let me say this again: please act in good faith.

And just to make sure: as far as I'm concerned, using any HebMorph code through Solr is just the same as using it through Lucene. Solr is dynamically linking the jars in what falls under the very definition of "derivative work", and in case that was in doubt, it isn't now. I'm explicitly specifying this, so even if there is a loophole here (which I'm quite certain there is not), it is now under the license definition of "use": if your application uses Solr, and Solr uses HebMorph, your application is effectively using AGPLv3 software and need to be AGPLv3 as well.

Hopefully this clarifies some things about HebMorph, and as always I'd love to hear any thoughts on this.

Due to the unintended conflict of licenses, any previous versions of HebMorph being used with Lucene/Solr has to move to the new license.

As before, OSS projects and non-profit closed source projects are welcome to use HebMorph with no charge, but the latter should contact me in advance to discuss some terms.

26Jun/112

FastVectorHighlighter issues revisited

In a previous post I described how to use FVH to highlight contents which went through filters / readers like HTMLStripCharFilter in the analysis process. As DIGY in the comments spotted right away, my approach was all wrong. Yes, I knew any CharFilter or Tokenizer implementation would store term positions and offsets that take into account any skips done in the content, but since it didn't work for me I didn't care to look any deeper and just made that work around, and then ran to tell.

So, don't use that. Instead, rely on your analyzer to store positions and offsets and on FVH to use them correctly when highlighting. As it happens, the custom analyzers I used suffered from a nasty bug that was not allowing them to consider skips. Now that I fixed that, it all works like a charm.

However, two issues still remained. First, since my stored fields contain HTML, the fragments may contain HTML tags as well, sometimes partial ones. In many cases the fragment that will end up on your webpage would ruin the page layout because of a stubborn misplaced </div> tag that found its way to the fragment. Escaping all <'s and >'s is not a really good solution - you don't really want your fragments to contain ugly looking HTML tags.

The second issue was having duplicate content. I wanted to process the content more than once - index it with 2 or more analyzers, but didn't want to store it more than once since it was exactly the same content.  To still be able to highlight on those other fields as well, I needed FVH to allow me to specify a field name to pull the stored contents from.

Solving the first problem was quite easy, and required nothing more than a simple extension function. It is called on the fragment string after receiving it from FVH. To be on the safe side, I made sure to ask for a larger fragment than I originally intended, so even if a lot of HTML noise is present, some context will remain in the fragment:

public static string HtmlStripFragment(this string fragment)
{
	if (string.IsNullOrEmpty(fragment)) return string.Empty;

	var sb = new StringBuilder(fragment.Length);
	bool withinHtml = false, first = true;
	foreach (var c in fragment)
	{
		if (c == '>')
		{
			if (first) sb.Length = 0;
			withinHtml = false;
			first = false;
			continue;
		}
		if (withinHtml)
			continue;
		if (c == '<')
		{
			first = false;
			withinHtml = true;
			continue;
		}
		sb.Append(c);
	}

	// FVH was instantiated with "[b]" and "[/b]" as post- and pre- tags for highlighting,
	// so they won't get lost in translation
	return sb.Append("...").Replace("[b]", "<b>").Replace("[/b]", "</b>").ToString();
}

The second issue was solved by subclassing FragmentsBuilder, only this time it was a bit less intrusive:

public class CustomFragmentsBuilder : BaseFragmentsBuilder
{
	public string ContentFieldName { get; protected set; }

	/// <summary>
	/// a constructor.
	/// </summary>
	public CustomFragmentsBuilder()
	{
	}

	public CustomFragmentsBuilder(string contentFieldName)
	{
		ContentFieldName = contentFieldName;
	}

	/// <summary>
	/// a constructor.
	/// </summary>
	/// <param name="preTags">array of pre-tags for markup terms</param>
	/// <param name="postTags">array of post-tags for markup terms</param>
	public CustomFragmentsBuilder(String[] preTags, String[] postTags)
		: base(preTags, postTags)
	{
	}

	public CustomFragmentsBuilder(string contentFieldName, String[] preTags, String[] postTags)
		: base(preTags, postTags)
	{
		ContentFieldName = contentFieldName;
	}

	/// <summary>
	/// do nothing. return the source list.
	/// </summary>
	public override List<WeightedFragInfo> GetWeightedFragInfoList(List<WeightedFragInfo> src)
	{
		return src;
	}

	protected override Field[] GetFields(IndexReader reader, int docId, string fieldName)
	{
		var field = ContentFieldName ?? fieldName;
		var doc = reader.Document(docId, new MapFieldSelector(new[] {field}));
		return doc.GetFields(field); // according to Document class javadoc, this never returns null
	}
}

And as always the usual disclaimer applies - this isn't necessarily the best way to do this, and I'd definitely like to hear of more elegant ways to achieve that if such exist.

19Jun/117

Custom tokenization and Lucene’s FastVectorHighlighter

NOTE: The approach described below is wrong, you may want to read the follow-up post.

Perhaps you have tackled this before: you wanted to use Lucene's FastVectorHighlighter (aka FVH), but since you have a custom CharFilter in your analysis chain, the highlighter fails to produce valid fragments.

In my particular case, I used HTMLStripCharFilter (available to Lucene.Net through my pet contrib project) to extract text content from HTML pages, and then pass it through the rest of the analysis process. This confused FVH, since it was taking the full content from store, where HTML was still present, and token positions were not taking that into account. And any other custom CharFilter that is added to the analysis chain is going to cause the same troubles.

To overcome this, I needed to make sure FVH is aware of all content stripping operations that are made before or while tokenization is happening. All I had to do was to implement a custom FragmentsBuilder, looking as follows (.Net code; a Java version would look almost identical):

public class HtmlFragmentsBuilder : BaseFragmentsBuilder
{
	/// <summary>
	/// a constructor.
	/// </summary>
	public HtmlFragmentsBuilder()
		: base()
	{
	}

	/// <summary>
	/// a constructor.
	/// </summary>
	/// <param name="preTags">array of pre-tags for markup terms</param>
	/// <param name="postTags">array of post-tags for markup terms</param>
	public HtmlFragmentsBuilder(String[] preTags, String[] postTags)
		: base(preTags, postTags)
	{
	}

	/// <summary>
	/// do nothing. return the source list.
	/// </summary>
	public override List<WeightedFragInfo> GetWeightedFragInfoList(List<WeightedFragInfo> src)
	{
		return src;
	}

	protected override String GetFragmentSource(StringBuilder buffer, int[] index, Field[] values, int startOffset, int endOffset)
	{
		string fieldText;
		while (buffer.Length < endOffset && index[0] < values.Length)
		{
			fieldText = GetFilteredFieldText(values[index[0]]);
			if (index[0] > 0 && values[index[0]].IsTokenized() && fieldText.Length > 0)
				buffer.Append(' ');
			buffer.Append(fieldText);
			++(index[0]);
		}
		var eo = buffer.Length < endOffset ? buffer.Length : endOffset;
		return buffer.ToString().Substring(startOffset, eo - startOffset);
	}

	/// <summary>
	/// Gets the field text, after applying custom filtering
	/// </summary>
	/// <param name="field"></param>
	/// <returns></returns>
	protected string GetFilteredFieldText(Field field)
	{
		var theStream = new MemoryStream(Encoding.UTF8.GetBytes(field.StringValue()));
		var reader = CharReader.Get(new StreamReader(theStream));
		reader = new HTMLStripCharFilter(reader);

		int r;
		var sb = new StringBuilder();
		while ((r = reader.Read()) != -1)
		{
			sb.Append((char)r);
		}
		return sb.ToString();
	}
}

FVH will then need to be configured to use it:

var fvh = new FastVectorHighlighter(FastVectorHighlighter.DEFAULT_PHRASE_HIGHLIGHT,												FastVectorHighlighter.DEFAULT_FIELD_MATCH,
					new SimpleFragListBuilder(), new HtmlFragmentsBuilder());
// ...
var fq = fvh.GetFieldQuery(query);
var fragment = fvh.GetBestFragment(fq, searcher.GetIndexReader(), hits[i].doc, "Content", 300);

If you're using Lucene.Net, you'll have to make sure this patch is applied to your FVH before this could compile.

That was the easiest way to get this working, and fast. Perhaps I could make it more generic, or change the original implementation to allow that and submit it as a patch. Maybe I'll do it someday. Or you could...

16Jun/110

Announcing: Lucene.Net.Contrib

Whenever you start doing real-world stuff with Lucene you find yourself hacking and extending. That's the beauty of Lucene - it has so many extension points, and you can write almost every part of it from scratch to match your requirements.

Lately I've been working on some stuff relating to both RavenDB and HebMorph (separately...), and it became quite annoying keeping track of Lucene.Net extensions that are not part of the core project. In fact, several contrib packages (rather: projects) that are part of the original Lucene.Net project are hardly maintained and are not so friendly to use

So, I thought it was time to give all those a home. I created a new github repository called Lucene.Net.Contrib, where all those enhancements, large or small, should go. Once there's enough to go on, I'll create a nuget package and make it easily accessible.

Having a centralized location for all those has only benefits. Bugs can be found and fixed, a lot of time can be saved by just looking if someone has already ported or wrote stuff that you need, and the most important of all: finding new opportunities. Java Lucene has all that for quite some time now, and since I've been doing Lucene.Net a lot lately, I thought I'd give my small donation...

This is not trying to compete with Lucene.Net's contrib section, it is just intended in being much more flexible, fast growing community of extensions, most probably will be small in size.

What's currently there (not much - and only analysis/search related):

  • HTMLStripCharFilter - by plugging this to the analysis chain you can get any analyzer strip all HTML tags and take those positions into considerations (useful for later highlighting).
  • ReverseStringFilter - reverses a string; useful for cases where you need to allow leading wildcards and never trailing wildcards.
  • BinaryCoordSimilarity - Lucene Similarity configuration, which in a multi-word query scenario is punishing all results which do not contain ALL search terms.

Other stuff that is probably going to be included (or makes sense to):

All code is released under the same Apache license as Lucene and Lucene.Net's, unless otherwise specified (but only permissive licenses are allowed in).

Have you put your Lucene.Net extensions in yet? Fork away!

GitHub repo: https://github.com/synhershko/Lucene.Net.Contrib

12Jun/114

Some updates on NAppUpdate

After having several issues with their auto-update mechanism, 2 weeks ago the Hibernating Rhinos profilers were updated to use NAppUpdate. Once again it was proven to be a very flexible and robust library, and several updates were already pushed to hundreds (thousands?) of users without any problem.

Before the profilers could start using NAppUpdate I had to make some updates to the library, namely: catch and expose the last error thrown (if any); fix an issue with UAC popping for updates on Windows 7 and Vista; better support for promptly cancelling a download mid-way; and a few other fixes and updates. These fixes are already available on github, and probably invalidate the 0.1 release...

Implementing NAppUpdate required custom implementation of a FeedReader and a Task, and the whole process didn't take more than one hour to code (testing is another story...). The profiler's AutoUpdateFeedReader makes a simple check against a very simple one-liner feed with the profiler's current version, and the it's AutoUpdateTask downloads the latest build as a zip file from the server, extracts it to a temporary folder and when told to overwrites the old files with the new ones in a bulk.

The actual task looks something like this - note the logical separation into steps, which are executed sequentially:

public bool Prepare(IUpdateSource source)
{
	// Clear temp folder
	if (Directory.Exists(updateDirectory))
	{
		try
		{
			Directory.Delete(updateDirectory, true);
		}
		catch {}
	}

	Directory.CreateDirectory(updateDirectory);

	// Download the zip to a temp file that is deleted automatically when the app exits
	string zipLocation = null;
	try
	{
		if (!source.GetData(LatestVersionDownloadUrl, string.Empty, ref zipLocation))
			return false;
	}
	catch (Exception ex)
	{
		Log.Error("Cannot get update package from source", ex);
		throw new UpdateProcessFailedException("Couldn't get Data from source", ex);
	}

	if (string.IsNullOrEmpty(zipLocation))
		return false;

	// Unzip to temp folder; no need to delete the zip file as this will be done by the OS
	return Extract(zipLocation);
}

public bool Execute()
{
	// since all we do is a cold update, nothing other than backup needs to happen here

	return true;
}

public IEnumerator&lt;KeyValuePair&lt;string, object&gt;&gt; GetColdUpdates()
{
	if (filesList == null)
		yield break;

	foreach (var file in filesList)
	{
		yield return new KeyValuePair&lt;string, object&gt;(file, Path.Combine(updateDirectory, file));
		Log.DebugFormat("Registering file {0} to be updated with {1}", file, Path.Combine(updateDirectory, file));
	}
}

Triggering the actual check for updates is a one-liner (after configuring the UpdateManager instance with a feed URL, a FeedReader and all that; the task is created and returned by the custom FeedReader):

UpdateManager.Instance.updateManager.CheckForUpdateAsync(StartDownloadingUpdate);

// ...

private void StartDownloadingUpdate(int updates)
{
	if (updates == 0) // no updates are available
		return;

	if (updates &lt; 0) // an error has occurred
	{
		Log.ErrorFormat("Error while checking for updates: {0}", UpdateManager.Instance.LatestError);
		return;
	}

	// If updates are found, start downloading them async
	UpdateManager.Instance.PrepareUpdatesAsync(success =&gt;
	{
		if (!success)
		{
			if (UpdateManager.Instance.LatestError != null)
			{
				Log.ErrorFormat("Error downloading updates: {0}", UpdateManager.Instance.LatestError);
			}
			return;
		}

		// Notify the user of the update, and call UpdateManager.Instance.ApplyUpdates() when ready
	});
}

It couldn't be simpler than that, and it just works...

This has triggered some interest in the project, and wheels are now in motion again and hopefully new features will be introduced soon, followed by a 0.2 release.

As always, you can grab the sources and file bugs here. Bugs and feature-requests can also be submitted to the mailing list.

16May/114

SisoDB: The wrong solution to the wrong problems

Data structures are the corner stone of computing. If you get them done right, you will most probably succeed in your mission of delivering an application that uses its resources wisely and performs well.

In modern computing, most data is stored to and retrieved from databases. Databases are data structures' big brothers - they serve the same purpose but with added value. Choosing one wisely can greatly help you in some many ways; going with the wrong one would cost you too much.

This is why one should not take lightly the decision of which database solution to use.

Dealing with data explosion

Since the 70's, whenever data had to be persisted, RDBMSes were the most effective and trusted tools to use. Since OOP became dominant developers found it quite itching to stuff their hierarchical entities into the flat structure of Tables and Rows, which is the ABC of RDBMSes. This is how ORMs came to life.

Coming to think of it retrospectively, ORMs were never the solution. They just made the problem less itching. In practice, your data still had to go through quite an awful lot of processing until it was persisted to, or loaded from store. But as long as it was transparent for the developer, and he knew that loads of optimization is happening under the hood, it seemed like there's nothing to worry about.

Although the concepts were known since the 80's, it was not until recent years real object-, document-, and graph-databases came into life. It took big players like Facebook and Twitter to get those ideas to mature and become production ready. Someone (or a handful of them) realized a shift in thinking is essential, and real-world problems like replication and sharding suddenly seemed a lot less complicated. As a result the NoSQL movement (or whatever it has become) is now full-steam ahead, and data-access best practices are being re-written.

Each NoSQL brand introduces some cool unique features, never seen in RDBMSes before. Document-oriented databases introduced the "schema-less" concept. That is, unlike in traditional RDBMSes, defining a data scheme is no longer required. The DocDB would either figure it out on its own, or it wouldn't even bother to. Data schemes are required in RDBMSes to define the table structure and allow for efficient indexing; DocDBs have a different go at it - Map/Reduce.

SisoDB choosing the wrong battle

SisoDB is the new face in town, but it looks like it is choosing the wrong battle. The problems it tries to solve are not real problems. Let me explain.

The SisoDB website explains the motivation behind SisoDB: the need of a real schema-less solution for data storage, while at the same time making sure the powerful tools offered by SQL Server are still available. ORMs are deemed evil because they require mappings, which contradicts the notion of schema-less, and non-MS-SQL backend is probably deemed irrelevant too. This is probably why there are no providers for Oracle nor MySQL.

So, in SisoDB data is now schema-less, but it spans over 3 tables per entity. This is how it looks (taken from the SisoDB site):

And the question arises: if real schema-less database is what you're after, a direct-POCO-to-storage-and-back-again solution, why would you use SisoDB with SQL Server in the first place? You can just use a NoSQL schema-less database, and if you treasure MSSQL's reporting tools that much just find a way to still be able to use them! When resorting to not using a NoSQL database, you lose ALL the possible sweet spots such products have to offer - which MsSQL offers none. And there are so many of them.

Nowadays it doesn't make sense to use SisoDB, neither in new development nor in existing applications. It may feel like being schema-less, but its fundaments are too deep in the RDBMS world, and it shows - to name a few:

  • Deep hierarchies and enumerables are not supported
  • Entity ids ought to be named SisoDb, making it harder to integrate with existing code
  • You can't specify string ids for entities (ids have to be int or Guids)
  • You have to CREATE your databases
  • For every model change you have to tell SisoDB to update the model; it will not be detected automatically, and a schema update is still required.
  • Various SQL common faults, like SELECT N+1 or not batching where possible.
  • Sharding and replication, other strong characteristics of NoSQL databases, are by definition one mile behind.

Some performance numbers were posted by the author comparing SisoDB and other ORMs for inserts. But queries are what you should really care about; and you are going to be disappointed. The most extensive indexing feature SQL has - relations between tables - is not being used in SisoDB by design. SisoDB doesn't define FKs, and doesn't operate JOINs. Put simply this means that by design SisoDB harms lookups performance, which is hands down the most crucial part of your application. You don't want this.

Just for comparison: RavenDB is a document database written in .NET, schema-less too and uses POCOs or raw JSON, with no mapping whatsoever, which uses Linq for querying. But it is real NoSQL, and as such it is offering much more natural replication and sharding functionality. Other features include full-text search out-of-the-box, entity versioning, REST API, complex and super-fast indexes, embedded mode, Silverlight support, and much more. And RavenDB comes with the ability to replicate its indexes to MsSQL so the reporting tools can still be used even though you're in NoSQL land.

If you were able to convince your bosses to use NoSQL, go with a real NoSQL solution. If not, try again. If you still fail, just keep using your favorite ORM and if mapping annoys you find ways to automate that process instead.

Tagged as: , , 4 Comments
30Mar/110

NAppUpdate now has a mailing list

I'm getting a lot of feedback on NAppUpdate, and obviously there's plenty to discuss on how to make it better and bugs-free.

So I went ahead and created a mailing list / Google group dedicated to NAppUpdate issues, suggestions and discussions. I'll be posting all future announcements to the list, too. All issues with implementing NAppUpdate in your solution, or thoughts on missing features, should be addressed there.

The group is here: http://groups.google.com/group/nappupdate.

All development work, and feature / issue tracking is on our github repository: https://github.com/synhershko/NAppUpdate. Source and binary downloads can be made from there.

Lets make application updates in .NET a breeze.

28Mar/110

So, what have I been up to lately?

To all of those who asked: NAppUpdate and HebMorph aren't dead.

The last few months have been quite hectic for me in several aspects, and this is mainly why I wasn't able to make any real progress with them, not to mention blogging. I still have big plans for both, and several other unrelated plans too. Things are becoming more relaxed now, so hopefully I could find time to work on all the ideas I have running in my head...

I recently joined Ayende's Hibernating Rhinos, and am currently working on RavenDB. So, it is inevitable some NoSQL/RavenDB ideas and posts will pop.

NAppUpdate is working very well for simple uses, which is exactly what I was intending to have when I first started working on it. However, judging by the stats I see there is quite a bit of interest in the tool, and in the features it is yet to offer. So I'm definitely planning on enhancing and improving it as time allows - feel free to jump in if you want to help. Work items I got planned include stabilizing the API and NauXML format, task groups, logging and reporting, UI and better handling of dependencies in tasks execution.

As for HebMorph, well, that is a project I'm deeply in love with, and as such it will be the first to get my attention. I'm getting a lot of positive feedback, but there's still so much work still left to do - even though what we have now is completely functional. I'll be giving a short talk on HebMorph and Hebrew search in Penguin Israel 2011 (location and date TBD), and until then I'm really hoping to get a few surprises ready. Stay tuned, I'll be blogging about them as I go.

15Feb/110

Thoughts on WPF / Silverlight

While developing quite a simple Silverlight application a few months ago I noticed how lagish it can become. Now I came across this article, and after realizing I wasn't imagining things I'm starting to realize how immature WPF and Silverlight are. And that is without mentioning some severe bugs WCF has - and you can't use Silverlight properly without a WCF host.

Here are a few snippets from the article and comments:

How many times have you had to scale back you UI because it was too jerky?  How many times have you came up with the “groundbreaking new UX model” that you had to scrap because the technology couldn’t handle it?  How many times have you told a customer they require a 2.4ghz quad core to get the full experience? I’ve been asked by customers why they cannot deliver the same fluid UX they have on their iPad application using WPF or Silverlight on a PC with four times the horses.  This technology may be good enough for line-of-business applications, but it falls short of being able to deliver a next generationconsumer application.

I get the same question from other developers and management: “why do phone applications running on dinky ARM processors feel so much smoother?”

Then I will see some cool HTML5 canvas example posted on Reddit, check the CPU loading it causes, and think there’s no way I could do that in WPF without 2x the loading. Even jquery widgets seem to behave more responsively than my WPF app.

My 2 cents: if you're looking to create a rich UI for the desktop, take a look at HTMLayout (.NET bindings here); I find the way XAML data bindings work too clumsy anyway. For web experiences, you should be all set with jQuery and HTML5; if you're looking to create games you should use Flash.