SQL Database Backend

From GNU MediaGoblin Wiki
Revision as of 05:59, 18 October 2011 by Elrond (talk | contribs) (moved User:Elrond/SQL Database Backend to SQL Database Backend: Somewhat ready for public consumption/editing)
Jump to navigation Jump to search

Introduction

MediaGoblin currently uses MongoDB as its database backend. There have been various reasons for this decision (link). Still, the idea of an SQL backend is coming up from time to time.

...

The existence of this wiki page does not give any evidence of the SQL backend getting developed soon!

People

Even an experimental implementation of an SQL backend, that will only support basic features will need a good amount of time/energy from some (core) developers. That time/energy will not be available to other projects.

  • Some knowing sqlalchemy is needed.
  • Someone having a good idea on the thin db layer is needed (Elrond will most likely be available for helping)
  • Someone having an idea on the current queries used in the code is needed too.

Development models

  • The biggest question is: Do we want to try an experimental sql backend? This is a tricky question. If it fails, it will be a lot of wasted time/energy. And contributors wouldn't be happy to have wasted their time/energy.
  • The wasted time/energy could be limited by limiting the first target. For example one could dedide, that the sql backend is not intended to support the full feature set.
  • Then there is the question on how to develop this:
    1. Long term branch: Develop an SQL replacement in a long term branch. The good: Anything can be changed as needed. Not so good: Some people dislike long term branches for various reasons.
    2. Trying to develop this alongside the current backend, in tree.

Major Issues

The current db model uses dicts to allow flexible storage for extensions and different media types. Representing those in sql is not simple. The dicts are normally keyed by a string and have arbitrary values. There are basicly two ways:

  1. Have a table with a key column (string) and a value column (string, json encoded). This will allow easy reconstruction of the dict for an object fetch and will allow queries on the key of the dict.
    BUT: It will not allow queries on the values. For example the imaginary geolocation extension will store the lat/lon there and might want to search for other media "nearby".
  2. Actually structure the things into distinct tables. The above mentioned imaginary geolocation extension would have its own table with lat/lon fields and be able to perform efficient queries on it.