Processing

From GNU MediaGoblin Wiki
Revision as of 03:59, 18 August 2011 by Cwebber (talk | contribs) (Information on how errors are handled)
Jump to navigation Jump to search

When you submit an image (or in the future, other media type), it might not be immediately processed, it might be passed somewhere else, get processed, then get moved up. Certainly though several things will happen after this point: a more thorough checking for whether the media submitted is of a valid type, thumbnail generation, possibly scaling down, conversion, or transcoding of your media, etc. The process in which all this happens is called "processing".

  ,-------,
  |       |                                                
  | MEDIA |                                                
  |       |                                                 ,--------,
  '-------'                                    _converted_\ | PUBLIC |
      |                                       /   media   / | STORE  |
    submit                                    :             '--------'
      |                                       :                 :
      V                                       :                 :
  .------------.                      ,------------,            :
  |            |                      |            |            V
  | submission |------queues--------> | processing |       ,-----------,
  |    view    |   for processing     |            |       |           |
  '------------'                      '------------'       |  Gallery  |
        :                                   ^              |           |
        :                                   :              '-----------'
        '           ,---------,             '
         \_save __\ |  QUEUE  |  __ fetch _/
           media  / |  STORE  |     media
                    '---------'

Celery

Processing is set up so that it can possibly be handed over to another process and not happen immediately. Resizing an image might not seem like it would take too long, and perhaps it wouldn't, but keep in mind that MediaGoblin is being designed to in the future handle all sorts of media types, including video, audio, etc. If we waited for a video to finish transcoding before we submitted it, we might time out before the process finishes!

If you are using ./lazyserver.sh (which most developers are) this won't be the case; Celery is run in always eager mode, which means that the task will be run by the same task that started it.

However if you want to split off processing into a separate process, that option is available. See Celery and the HackingHowto for details.

The general process

Note: some of the description of what's happening here comes from the processing branch, which is not yet merged but should be merged in shortly.

First of all, your process will be passed off to the function in mediagoblin.process_media.process_media(). That function will receive the id of your MediaEntry for retreival, and will use that information to find out what media type this is and dispatch it further to the appropriate function (or rather, that's the future plan, for now it just passes it over to the image processor since we only support images ;)).

The file you submitted will be referenced in the MediaEntry['queued_media_file'] key in the usual filepath "list" type format. This file path is then passed to the queue_store system to retrieve the file. See Storage for more details on how the storage systems work.

While the file is being worked on... say converted to a smaller image files or transcoded to webm... we need a way to access that file locally. What if it's already on a local file store? There's no reason for us to create a new file for it then. But what if it's on a remote filestore? We should *conditionally* copy the file locally, but make it really easy for the processing code to not have to know what we did. The workbench helps here. It also gives a temporary place to save other files during conversion. A fresh workbench is made for every processing job by the Workbench Manager (see workbench.py), and when all is done here, the workbench is destroyed.

Anyway, at this point the media's processing code does whatever conversions it needs, saving temporary files to the workbench.

When files are converted over (including the thumbnail) they are saved into the public_store and recorded in the MediaEntry['media_files'] dictionary. The state of the MediaEntry is moved to 'processed' and everyone rejoices.

TL;DR: After you submit your media it gets stored in the queue_store, gets passed to processing (which may or may not run in a separate process via celery), gets passed to the appropriate processing function, and the processed media gets saved in the public_store.

Errors

Handled failures

Certain kinds of errors we want to expect and handle. For example, if a user managed to submit something they claimed was an image but it had a .txt extension, it would never hit processing, but if it was a text file named as .jpg, we wouldn't know until we tried opening it (and we don't bother until the processing stage.) This is an error, but a *handled* error. There might be other kinds of handled errors too... we might not support transcoding a certain kind of file, or we might not have the libraries installed to know how convert these.

These are handled errors, and it's good if we can raise and report them in a useful way. Because of this, we allow for raising errors during media processing methods that are derived from mediagoblin.process_media.errors.BaseProcessingFail. You shouldn't raise BaseProcessingFail directly, but instead a subclass of it. For example, if your user submitted a bad piece of media, you can:

  raise BadMediaFail()

Then the processing will be aborted, the MediaEntry['state'] will be saved as 'failed'. In addition, since this is a special inherited-from-BaseProcessingFail type error, the python path to this error will be saved in MediaEntry['fail_error']. (We can then later recall the exception class that was raised previously via MediaEntry.get_fail_exception()). In addition, if you pass in any keyword arguments to the exception these will be stored in MediaEntry['fail_metadata'] so that later on we might be able to construct some extra information about what happened.

Lastly, failure exceptions inherited from BaseProcessingFail should provide a general_message class-level attribute. This should be a "lazy_pass_to_ugettext" wrapped string that explains the error in a way that users of a MediaGoblin site might understand.

There is actually some experimental and not-ideal-but-working support for this under the path /u/username/panel/ ... assuming you are the relevant user or are an admin, you'll be able to see what recent media entry submissions failed here and why.

Unhandled failures

Exceptions that are raised that *don't* inherit from BaseProcessingFail are considered "unhandled". Your MediaEntry should still be correctly marked as 'failed', but the 'fail_error' field of the model won't be filled in, and neither can any fail_metadata be saved. Furthermore we can't provide a useful error to a user.

Nonetheless, unexpected errors do occur... some sort of IOError might happen that we don't anticipate or handle. Assuming you aren't running lazyserver (or otherwise with CELERY_ALWAYS_EAGER set to true and CELERY_PROPAGATE_ERRORS set to True) where unhandled failures will be propagated to your console, you may want to debug what went wrong at a later time. How to do this?

Fortunately here Celery helps us out. Your processing results will be recorded in "celery_taskmeta" and you can find the '_id' of your result by looking at MediaEntry['queued_task_id']. The exception that was raised should be stored here, so you should be able to use this for debugging purposes.