Many to many relationships and RavenDB models

RavenDB, RavenDB in Action, EventsZilla

Comments

4 min read

I've recently come across this question in StackOverflow titled "Many to many design for NoSql (RavenDB)", and I thought it is a really good case of a very frequently asked question worth featuring here:

I have two simple objects, 'user' and 'events'. A user can enter many events and an event can be entered by many users - a standard many to many relationship. I'm trying to get out of the relational database mindset!

Here are the queries/actions that I'd like to run against the database:

  • Get me all the events that a user has not entered
  • Get me all the events that a user has entered
  • Update all the events (properties like remaining spaces) very frequently (data polled from various external data sources).
  • Removing events when they've expired

The OP then lists 3 options of solving this:

  1. A "relational approach": Create a new object, that links user and events. For example, a "booking" object, that stores the userId, eventId
  2. Denormalise the events data within the user object, so there is a list of eventIds on the user.
  3. Don't use RavenDB for this, instead use a relational database.

Relational vs RavenDB modeling questions are very common, and the question about many to many relations is one in particular that I'm getting very frequently. That is, when people are aware design decisions differ between relational databases and document databases (and between document databases and RavenDB, and basically between every type of non-relational databases). Its also very common for people to find it hard to escape the relational thinking, as can be evident from the 3 options the OP listed.

I was just finishing writing chapter 5 of RavenDB in Action which discusses document-oriented modeling, so I grabbed my pen (well, keyboard) and wrote down the following answer. It was only a coincidence that I previously discussed almost the exact same domain in the past.

The event-registration domain is a great one for discussing RavenDB modeling, especially because there may be various use cases in play, and use cases dictate the approach to take when deciding on the model. This is pretty much what I tried explaining in my answer, using some nice indexing technique to solve one less-trivial query that was required:

Actually, RavenDB is a perfect fit for that. To properly do that, ask yourself what are the main entities in your model? each one of those will be a document type in RavenDB.

So in your scenario, you'd have Event and User. Then, an Event can have a list of User IDs which you can then easily index and query on. There are more ways to do that, and I actually discussed this in my blog some time in the past with some further considerations that might come up.

The only non-trivial bit is probably the index for answering queries like "all events user has not entered", but that's quite easily done as well:

public class Events_Registrations : AbstractIndexCreationTask<Event>
{
        public Events_Registrations()
        {
                Map = events => from e in events
                              select new { EventId = e.Id, UserId = e.Registrations.SelectMany(x => x.UserId) });

        }
}

Once you have such an index in place, you can do a query like below to get all events a specified user has no registrations to:

var events = RavenSession.Advanced.LuceneQuery<Event, Events_Registrations>()
                                .Where("EventId:*")
                                .AndAlso()
                                .Not
                                .WhereEquals("UserId", userId).ToList();

Then handling things like expiring events etc is very easily done. Definitely don't denormalize event data in the User object, it will make your life living hell.

There's some data missing though - for example how many registrations are allow per event. If there are too many you may want to break it out of the actual Event object, or to revert to Booking objects as you mention. I discuss this in length in my book RavenDB in Action.


Comments

  • Khalid Abuhakmeh

    I think I would create a booking document in a document model, just because storing users in an event can get really large. Imagine one of those events is Coachella (music concert), or the inauguration of the President of the United States. You would literally have 10K to 100K of user identifiers in a single event document. At some point you would be unable to pull that event back in any reasonable amount of time or even modify it.

    Just my 2 cents.

    • Itamar Syn-Hershko

      Khalid,

      This is pretty much what I wrote in the last paragraph of this post. Also, if you read the original EventsZilla post I linked to you will see this was my original advice.

      To sum up our Twitter conversation on the topic: the original requirements pointed to the design advice I've given here, which is a very good starting point and very easy to get up and running with. As I pointed out, there's some missing data in the requirements themselves, hence the advice may or may not be the right one for the actual scenario.

      Having a Booking document is definitely a valid option, but it has some drawbacks you should consider.

      Note however, since it is highly unlikely you will host the registration to a presidential inauguration any time soon it is pretty safe to assume most events you will allow registrations to will hold 100s to 1000s of attendees max. For this amount of registrants it is still fairly easy to handle concurrency, and documents can still be kept small. This will not be the main problem you'll have to deal with, even if its something worth taking note of.

      HTH

Comments are now closed