Why MongoDB

From GNU MediaGoblin Wiki
Jump to navigation Jump to search

Chris Webber on "Why MongoDB":

In case you were wondering, I am not a NOSQL fanboy, I do not go around telling people that MongoDB is web scale. Actually my choice for MongoDB isn't scalability, though scaling up really nicely is a pretty good feature and sets us up well in case large volume sites eventually do use MediaGoblin. But there's another side of scalability, and that's scaling down, which is important for federation, maybe even more important than scaling up in an ideal universe where everyone ran servers out of their own housing. As a memory-mapped database, MongoDB is pretty hungry, so actually I spent a lot of time debating whether the inability to scale down as nicely as something like SQL has with sqlite meant that it was out. But I decided in the end that I really want MongoDB, not for scalability, but for flexibility. Schema evolution pains in SQL are almost enough reason for me to want MongoDB, but not quite.

Most Importantly

The real reason is because I want the ability to eventually handle multiple media types through MediaGoblin, and also allow for plugins, without the rigidity of tables making that difficult. In other words, something like::

       {"title": "Me talking until you are bored",
        "description": "blah blah blah",
        "media_type": "audio",
        "media_data": {
            "length": "2:30",
            "codec": "OGG Vorbis"},
        "plugin_data": {
            "licensing": {
                "license": "http://creativecommons.org/licenses/by-sa/3.0/"}}}


Being able to just dump media-specific information in a media_data hashtable is pretty great, and even better is having a plugin system where you can just let plugins have their own entire key-value space cleanly inside the document that doesn't interfere with anyone else's stuff. If we were to let plugins to deposit their own information inside the database, either we'd let plugins create their own tables which makes SQL migrations even harder than they already are, or we'd probably end up creating a table with a column for key, a column for value, and a column for type in one huge table called "plugin_data" or something similar. (Yo dawg, I heard you liked plugins, so I put a database in your database so you can query while you query.) Gross.

Keeping It Clean

I also don't want things to be too loose so that we forget or lose the structure of things, and that's one reason why I want to use MongoKit, because we can cleanly define a much structure as we want and verify that documents match that structure generally without adding too much bloat or overhead (MongoKit is a pretty lightweight wrapper and doesn't inject extra MongoKit-specific stuff into the database, which is nice and nicer than many other ORMs in that way).