SQL Database Backend

From GNU MediaGoblin Wiki
Jump to navigation Jump to search

Introduction

MediaGoblin currently uses MongoDB as its database backend. There have been various reasons for this decision (see: Why MongoDB, yes really, look at it. It has a lot of insight what are the problems with sql). Still, the idea of an SQL backend is coming up from time to time.

...

The existence of this wiki page does not give any evidence of the SQL backend getting developed soon!

People

Even an experimental implementation of an SQL backend, that will only support basic features will need a good amount of time/energy from some (core) developers. That time/energy will not be available to other projects.

  • Some knowing sqlalchemy is needed.
  • Someone having a good idea on the thin db layer is needed (Elrond will most likely be available for helping)
  • Someone having an idea on the current queries used in the code is needed too.

Development models

  • The biggest question is: Do we want to try an experimental sql backend? This is a tricky question. If it fails, it will be a lot of wasted time/energy. And contributors wouldn't be happy to have wasted their time/energy.
  • The wasted time/energy could be limited by limiting the first target. For example one could dedide, that the sql backend is not intended to support the full feature set.
  • Then there is the question on how to develop this:
    1. Long term branch: Develop an SQL replacement in a long term branch. The good: Anything can be changed as needed. Not so good: Some people dislike long term branches for various reasons.
    2. Trying to develop this alongside the current backend, in tree.

Major Issues

The current db model uses dicts to allow flexible storage for extensions and different media types. Representing these data in SQL is not simple. The dicts are normally keyed by a string and have arbitrary values. There are basically two ways:

  1. Have a table with a key column (string) and a value column (string, json encoded). This will allow easy reconstruction of the dict for an object fetch and will allow queries on the key of the dict.
    However, it will not allow queries on the values. For example the imaginary geolocation extension will store the lat/lon there and might want to search for other media "nearby".
    If the value column is split into three columns (i.e. value_int value_string value_json), some queries, such as those on numbers, would be possible. .
    Note: The table would actually look something like this: create table super_data_table(media_id references media(id), plugin_id int, key, value_int, value_string); ... note that this design was predicted, determined by Chris Webber to be gross, and was aimed to be avoided, hence the choice of MongoDB. See Why MongoDB for details.
  2. Actually structure the things into distinct tables. The above mentioned imaginary geolocation extension would have its own table with lat/lon fields and be able to perform efficient queries on it.

Some sketchup in SQL

create table users(
   id integer primary key not null,
   username varchar(30) unique not null,
   email text unique,
   created timestamp default CURRENT_TIMESTAMP,
   pw_hash text,
   email_verified boolean default FALSE,
   status varchar(30) default 'needs_email_verification',
   verification_key uuid,
   is_admin boolean default FALSE,
   url text,
   bio text,
   fp_verification_key uuid,
   fp_token_expire timestamp
   );

create table file_records(
   id integer primary key not null,
   filename text array -- or json?
   );

create table media_entries(
   id integer primary key not null,
   uploader integer references users(id) not null,
   title text,
   slug text,
   created timestamp default CURRENT_TIMESTAMP,
   description text,
   state text default 'unprocessed',
   queued_media_file integer references file_records(id),
   queued_task_id text,
   fail_error text,
   fail_metadata text -- json
   );

create table media_files(
   id integer primary key not null,
   media_id integer references media_entries(id) not null,
   name varchar(30) not null,
      -- "original", "thumbnail", ...
      -- maybe also "attachment"-something?
   file integer references file_records(id) not null
   );

create table media_comments(
   id integer primary key not null,
   media_entry integer references media_entries(id) not null,
   author integer references users(id) not null,
   created timestamp default CURRENT_TIMESTAMP,
   content text
   );