Processing

From GNU MediaGoblin Wiki
(Difference between revisions)
Jump to: navigation, search
(Pretty detailed description of the processing basics)
 
(The general process)
Line 17: Line 17:
 
The file you submitted will be referenced in the MediaEntry['queued_media_file'] key in the usual filepath "list" type format.  This file path is then passed to the queue_store system to retrieve the file.  See [[Storage]] for more details on how the storage systems work.
 
The file you submitted will be referenced in the MediaEntry['queued_media_file'] key in the usual filepath "list" type format.  This file path is then passed to the queue_store system to retrieve the file.  See [[Storage]] for more details on how the storage systems work.
  
While the file is being worked on... say converted to a smaller image files or transcoded to webm... we need a way to access that file locally.  What if it's already on a local file store?  There's no reason for us to create a new file for it then.  But what if it's on a remote filestore?  We should *conditionally* copy the file locally, but make it really easy for the processing code to not have to know what we did.  The [[Storage#The_workbench][workbench]] helps here.  It also gives a temporary place to save other files during conversion.  A fresh workbench is made for every processing job by the Workbench Manager (see workbench.py), and when all is done here, the workbench is destroyed.
+
While the file is being worked on... say converted to a smaller image files or transcoded to webm... we need a way to access that file locally.  What if it's already on a local file store?  There's no reason for us to create a new file for it then.  But what if it's on a remote filestore?  We should *conditionally* copy the file locally, but make it really easy for the processing code to not have to know what we did.  The [[Storage#The_workbench|workbench]] helps here.  It also gives a temporary place to save other files during conversion.  A fresh workbench is made for every processing job by the Workbench Manager (see workbench.py), and when all is done here, the workbench is destroyed.
  
 
Anyway, at this point the media's processing code does whatever conversions it needs, saving temporary files to the workbench.
 
Anyway, at this point the media's processing code does whatever conversions it needs, saving temporary files to the workbench.

Revision as of 21:38, 17 August 2011

When you submit an image (or in the future, other media type), it might not be immediately processed, it might be passed somewhere else, get processed, then get moved up. Certainly though several things will happen after this point: a more thorough checking for whether the media submitted is of a valid type, thumbnail generation, possibly scaling down, conversion, or transcoding of your media, etc. The process in which all this happens is called "processing".

Contents

Celery

Processing is set up so that it can possibly be handed over to another process and not happen immediately. Resizing an image might not seem like it would take too long, and perhaps it wouldn't, but keep in mind that MediaGoblin is being designed to in the future handle all sorts of media types, including video, audio, etc. If we waited for a video to finish transcoding before we submitted it, we might time out before the process finishes!

If you are using ./lazyserver.sh (which most developers are) this won't be the case; Celery is run in always eager mode, which means that the task will be run by the same task that started it.

However if you want to split off processing into a separate process, that option is available. See Celery and the HackingHowto for details.

The general process

Note: some of the description of what's happening here comes from the processing branch, which is not yet merged but should be merged in shortly.

First of all, your process will be passed off to the function in mediagoblin.process_media.process_media(). That function will receive the id of your MediaEntry for retreival, and will use that information to find out what media type this is and dispatch it further to the appropriate function (or rather, that's the future plan, for now it just passes it over to the image processor since we only support images ;)).

The file you submitted will be referenced in the MediaEntry['queued_media_file'] key in the usual filepath "list" type format. This file path is then passed to the queue_store system to retrieve the file. See Storage for more details on how the storage systems work.

While the file is being worked on... say converted to a smaller image files or transcoded to webm... we need a way to access that file locally. What if it's already on a local file store? There's no reason for us to create a new file for it then. But what if it's on a remote filestore? We should *conditionally* copy the file locally, but make it really easy for the processing code to not have to know what we did. The workbench helps here. It also gives a temporary place to save other files during conversion. A fresh workbench is made for every processing job by the Workbench Manager (see workbench.py), and when all is done here, the workbench is destroyed.

Anyway, at this point the media's processing code does whatever conversions it needs, saving temporary files to the workbench.

When files are converted over (including the thumbnail) they are saved into the public_store and recorded in the MediaEntry['media_files'] dictionary. The state of the MediaEntry is moved to 'processed' and everyone rejoices.

TL;DR: After you submit your media it gets stored in the queue_store, gets passed to processing (which may or may not run in a separate process via celery), gets passed to the appropriate processing function, and the processed media gets saved in the public_store.

Errors

Handled failures

Unhandled failures

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox