Scaling Down

From GNU MediaGoblin Wiki
Revision as of 15:16, 5 March 2014 by Aleksejrs (talk | contribs) (→‎MongoDB: a note on only-SQL-now)
Jump to navigation Jump to search


Software as a service (Saas) is currently undergoing a radical transformation as software developers address the stability and privacy concerns inherent to centralized services. In anticipation of the proliferation of small networked devices on which to run federated free software services, efforts to minimize MediaGoblin's hardware requirements are ongoing. This page is devoted to chronicling these efforts.


In part due to the need for these settings, MediaGoblin only supports SQL now.

MongoDB needs 0.5 GB for a fresh install (but a preallocated journal may take much more, read on). It is conceivable that MediaGoblin could be configured to use SQLite or redis, but a database abstraction layer is not of primary concern. In the meantime, there are ways to reduce MongoDB's disk requirements.

Use sparse tables

Half of the 0.5 GB goes to kombu.

mongodb database files contain a lot of NUL bytes. So one can easily use sparse files to save space on disk:

service mongodb stop
cd /var/lib/
cp -a --sparse=always mongodb
mv mongodb mongodb.old
mv mongodb
service mongodb start

# TEST mongodb
rm -rf mongodb.old

Maybe later versions of mongodb do this already internally. The above was needed on mongodb from debian/stable.

Modify mongodb.conf

Disable pre-allocation

MongoDB usually allocates a fresh, empty file for mmapping. So it has fresh space, just in case. That takes up a lot of space on disk.

noprealloc = true

This is not about journal files preallocation. See “smallfiles” below for that.

Namespace file size

It's not clear whether this would break everything, but the namespace size could theoretically be limited, in MB:

nssize = 1


Running GMG/celery in always-eager-mode is the default for It saves half the mongo size, as the kombu component of celery isn't used. However, any processing is synchronous; therefore you have to wait for the server to process your uploaded media, while sitting at your desk and watching your browser. For small media files this is not a problem.

celery_always_eager will actually save RAM, as it means a complete database not being created/used!

smallfiles (?)

Adjusting this command line parameter is not tested; however it should make the first mmaped file much smaller. It only makes sense if you intend to have a small amount of data in your database.

This option (works as a mongodb.conf entry, too) makes the preallocated journal files be smaller, 128 MiB each in my case (3 files), instead of 1 GiB each (at least 2 files). --Aleksejrs 06:56, 9 October 2011 (EDT)

Other thoughts

Most mongo options take effect before even starting GMG, so they must be set before the "mediagoblin" database is created.

Most options only affect disk space. mmaping a 1 TB file doesn't load that file into RAM. Even on a 64-bit machine the OS only loads stuff into RAM actually used. And the OS can "swap" (write to the file on disk) the stuff anyway, if needed. The real problem is a large db, because it will basicly be either fully in RAM, or any operation on it will swap like mad.

Backing up and restoring the complete db might save some space/RAM, as it might remove some fragmentation; however this is untested.

See also