User:Cwebber/braindumps: Difference between revisions

From GNU MediaGoblin Wiki
Jump to navigation Jump to search
No edit summary
(Added config section)
 
(3 intermediate revisions by the same user not shown)
Line 17: Line 17:
Pretty simple! Just run:
Pretty simple! Just run:


<pre>
./bin/gmg migrate
./bin/gmg migrate
</pre>


=== How to add a new migration ===
=== How to add a new migration ===
Line 23: Line 25:
Migrations are handled in:
Migrations are handled in:


<pre>
: mediagoblin/db/migrations.py
mediagoblin/db/migrations.py
</pre>


Migrations aren't too complex! Basically they're just python
Migrations aren't too complex! Basically they're just python
Line 96: Line 100:
don't use the '$rename' modifier as it isn't supported in versions of
don't use the '$rename' modifier as it isn't supported in versions of
mongodb used in most current stable distributions.)
mongodb used in most current stable distributions.)

== Indexing ==

Some of the following is extracted straight from the mediagoblin/db/indexes.py

=== Running latest updates / deprecation of indexes ===

<pre>
./bin/gmg migrate
</pre>

Yes, this is the same as the migration command.

=== For developers ===
==== Overview ====

Quick summary on indexes generally:

* Basically, indexes make querying fast. MongoDB doesn't auto-create indexes though, we have to specify them.
* Core things we're working on require indexes. Querying on multiple keys at once requires a multi-key index... MongoDB lacks an algorithm to combine multiple single-key indexes currently
* The ordering of keys in multi-key indexes matter.
* Adding new queries, or adding new fields, etc... maybe discuss whether or not an index is appropriate! New indexes do have a performance and memory penalty, but not using indexes means a query slowness penalty.

For those touching indexes, you should read:

* http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/
* http://www.mongodb.org/display/DOCS/Indexes
* http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ


==== To add new indexes ====

Indexes are recorded in the following format:

<pre>
ACTIVE_INDEXES = {
'collection_name': {
'identifier': { # key identifier used for possibly deprecating later
'index': [index_foo_goes_here]}}
</pre>

... and anything else being parameters to the create_index function
(including unique=True, etc)

Current indexes must be registered in ACTIVE_INDEXES... deprecated
indexes should be marked in DEPRECATED_INDEXES.

Remember, ordering of compound indexes MATTERS.


==== To remove deprecated indexes ====

Removing deprecated indexes is the same, just move the index into the
deprecated indexes mapping.

<pre>
DEPRECATED_INDEXES = {
'collection_name': {
'deprecated_index_identifier1': {
'index': [index_foo_goes_here]}}
</pre>
... etc.

If an index has been deprecated that identifier should NEVER BE USED
AGAIN. Eg, if you previously had 'awesomepants_unique', you shouldn't
use 'awesomepants_unique' again, you should create a totally new name
or at worst use 'awesomepants_unique2'.

The reason for this is because the index name is how we track whether
or not the index is installed. Using the same name makes this
difficult. So just use a new name!

== What are all the fields in the database? ==

Check the docstrings in mediagoblin/db/models.py

= Writing tests =

Writing tests is not too hard. Just look at other examples in
mediagoblin/tests/

== Running tests ==

Simply use:

./runtests.sh

This basically calls ./bin/nosetests and sets a necessary CELERY
environment variable. Any arguments that you pass in get passed to
the nosetests command itself.

There are some useful arguments you can pass in while debugging... in
particular:

* ''--pdb'' - Enter pdb whenever an exception is raised
* ''--pdb-failures'' - Enter pdb whenever an assertion failures happens (honestly, I almost always run --pdb and --pdb-failures at the same time)
* ''--nocapture'' - Don't capture stdout. Useful for catching print statements or if you've done a pdb.set_trace() (if you do pdb.set_trace() and *don't* do this, the tests appear to hang, but haven't really... you just can't see the prompt)
== Writing tests ==

Look at some other examples in the various test modules and read the
docs:

http://somethingaboutorange.com/mrl/projects/nose/1.0.0/

But basically you just want to write functions that start with "test_"
(Or classes similarly starting with Test* and methods like test*).

For things where you want access to an "application", use the
setup_fresh_app decorator, eg like:

<pre>
from mediagoblin.tests.tools import setup_fresh_app

@setup_fresh_app
def test_register_views(test_app):
[...]
</pre>

This will setup a fresh "application" for you, turn on "testing
bucket" stuff (more on this in a few) in utils.py if it isn't already,
and clear the "testing buckets" if they have stuff in them. You can
skip the decorator and call these components manually. So let's
describe those:

* ''mediagoblin.tests.tools.get_test_app()'' - get an application for testing... same thing as is passed into the function with the setup_fresh_app decorator, and wipes the database and session and public_storage/queued_storage things. What's returned is a [http://pythonpaste.org/webtest/ WebTest] wrapper around the application.
* ''mediagoblin.util._activate_testing()'' - sets TESTS_ENABLED, which means that various tools wil cache extra information about templates visited, context passed in, emails sent, etc.
* ''mediagoblin.util.clear_test_buckets()'' - Clears all the types of buckets described above.

One of the less obvious of these "buckets" is the
util.TEMPLATE_TEST_CONTEXT bucket. Assuming you use
util.render_template() or util.render_to_response(),
util.TEMPLATE_TEST_CONTEXT will be populated with the name of the
template as the key and the context supplied as the value.

= Config stuff =

You might notice that MediaGoblin has two configuration files... one
is paste.ini and one is mediagoblin.ini. This is only sort of true,
paste.ini is only really used to set up the application to be run by
Paste Deploy. Anyway, the focus of this section is on the
mediagoblin.ini file.

In short: it uses [http://www.voidspace.org.uk/python/configobj.html ConfigObj]. There are a few particular features
about this...

* We can set defaults and specify types in the file mediagoblin/config_spec.ini
* It *is* possible to add custom validators / types. See the ConfigObj and the [http://www.voidspace.org.uk/python/validate.html#adding-functions Validator] documentation.
* you'll notice that there's a whole section in here that's celery related... we can configure celery from the config file.

Latest revision as of 16:58, 23 July 2011

Braindumps! Go, go, go!

Database stuff

Migrations

What are migrations?

Sometimes the way we store data changes. We might add a new field or deprecate an old field. Even though MongoDB allows us to store things very "flexibly", it's important that as we change our schema (add new fields, remove fields, rename fields, restructure some data) that our database is updated to have things stored in the same form that we expect to access them in our codebase.

How to run

Pretty simple! Just run:

./bin/gmg migrate

How to add a new migration

Migrations are handled in:

mediagoblin/db/migrations.py

Migrations aren't too complex! Basically they're just python arguments that take a pymongo database as their sole argument.

Note, this is a pymongo database and NOT a mongokit database! The reason for this is we don't want people to use the ORM to avoida a chicken and egg type paradox: our ORM might have tools that expect things to be at our current schema

So we force people to use the simple pymongo database API, which is itself not too hard!

Let's look at one from the unit tests:

@RegisterMigration(1, TEST_MIGRATION_REGISTRY)
def creature_add_magical_powers(database):
    """
    Add lists of magical powers.

    This defaults to [], an empty list.  Since we haven't declared any
    magical powers, all existing monsters should
    """
    database['creatures'].update(
        {'magical_powers': {'$exists': False}},
        {'$set': {'magical_powers': []}},
        multi=True)

This is a fairly simple example where we're using the update command to find all instances where there aren't any magical powers, and we set the magical powers thusly to an empty list. For more on update commands and mongodb, see:

http://www.mongodb.org/display/DOCS/Updating

You'll notice that preceding the migration is a decorator called RegisterMigration. This takes two arguments: the number of this migration (increment this by one from whatever the last migration was) and the "migration registry" we'll be storing it in. (Probably MIGRATIONS). This decorator will thusly record your migration in the registry as being this version. (This is compared against the "current_migration"'s value stored in your database's 'app_metadata' collection in the '_id': 'mediagoblin' document so it knows what new migrations must be run.)

Another example, from the real migrations in migrations.py:

@RegisterMigration(1)
def user_add_bio_html(database):
    """
    Users now have richtext bios via Markdown, reflect appropriately.
    """
    collection = database['users']

    target = collection.find(
        {'bio_html': {'$exists': False}})

    for document in target:
        document['bio_html'] = cleaned_markdown_conversion(
            document['bio'])
        collection.save(document)

This one is slightly more compliated, but still not too hard. It's just taking the value from the key 'bio', running it through the cleaned_markdown_conversion, and storing that result in bio_html.

(Just one more note, if you're using the update method, for now please don't use the '$rename' modifier as it isn't supported in versions of mongodb used in most current stable distributions.)

Indexing

Some of the following is extracted straight from the mediagoblin/db/indexes.py

Running latest updates / deprecation of indexes

./bin/gmg migrate

Yes, this is the same as the migration command.

For developers

Overview

Quick summary on indexes generally:

  • Basically, indexes make querying fast. MongoDB doesn't auto-create indexes though, we have to specify them.
  • Core things we're working on require indexes. Querying on multiple keys at once requires a multi-key index... MongoDB lacks an algorithm to combine multiple single-key indexes currently
  • The ordering of keys in multi-key indexes matter.
  • Adding new queries, or adding new fields, etc... maybe discuss whether or not an index is appropriate! New indexes do have a performance and memory penalty, but not using indexes means a query slowness penalty.

For those touching indexes, you should read:


To add new indexes

Indexes are recorded in the following format:

ACTIVE_INDEXES = {
    'collection_name': {
        'identifier': {  # key identifier used for possibly deprecating later
            'index': [index_foo_goes_here]}}

... and anything else being parameters to the create_index function (including unique=True, etc)

Current indexes must be registered in ACTIVE_INDEXES... deprecated indexes should be marked in DEPRECATED_INDEXES.

Remember, ordering of compound indexes MATTERS.


To remove deprecated indexes

Removing deprecated indexes is the same, just move the index into the deprecated indexes mapping.

DEPRECATED_INDEXES = {
    'collection_name': {
        'deprecated_index_identifier1': {
            'index': [index_foo_goes_here]}}

... etc.

If an index has been deprecated that identifier should NEVER BE USED AGAIN. Eg, if you previously had 'awesomepants_unique', you shouldn't use 'awesomepants_unique' again, you should create a totally new name or at worst use 'awesomepants_unique2'.

The reason for this is because the index name is how we track whether or not the index is installed. Using the same name makes this difficult. So just use a new name!

What are all the fields in the database?

Check the docstrings in mediagoblin/db/models.py

Writing tests

Writing tests is not too hard. Just look at other examples in mediagoblin/tests/

Running tests

Simply use:

 ./runtests.sh

This basically calls ./bin/nosetests and sets a necessary CELERY environment variable. Any arguments that you pass in get passed to the nosetests command itself.

There are some useful arguments you can pass in while debugging... in particular:

  • --pdb - Enter pdb whenever an exception is raised
  • --pdb-failures - Enter pdb whenever an assertion failures happens (honestly, I almost always run --pdb and --pdb-failures at the same time)
  • --nocapture - Don't capture stdout. Useful for catching print statements or if you've done a pdb.set_trace() (if you do pdb.set_trace() and *don't* do this, the tests appear to hang, but haven't really... you just can't see the prompt)

Writing tests

Look at some other examples in the various test modules and read the docs:

http://somethingaboutorange.com/mrl/projects/nose/1.0.0/

But basically you just want to write functions that start with "test_" (Or classes similarly starting with Test* and methods like test*).

For things where you want access to an "application", use the setup_fresh_app decorator, eg like:

from mediagoblin.tests.tools import setup_fresh_app

@setup_fresh_app
def test_register_views(test_app):
    [...]

This will setup a fresh "application" for you, turn on "testing bucket" stuff (more on this in a few) in utils.py if it isn't already, and clear the "testing buckets" if they have stuff in them. You can skip the decorator and call these components manually. So let's describe those:

  • mediagoblin.tests.tools.get_test_app() - get an application for testing... same thing as is passed into the function with the setup_fresh_app decorator, and wipes the database and session and public_storage/queued_storage things. What's returned is a WebTest wrapper around the application.
  • mediagoblin.util._activate_testing() - sets TESTS_ENABLED, which means that various tools wil cache extra information about templates visited, context passed in, emails sent, etc.
  • mediagoblin.util.clear_test_buckets() - Clears all the types of buckets described above.

One of the less obvious of these "buckets" is the util.TEMPLATE_TEST_CONTEXT bucket. Assuming you use util.render_template() or util.render_to_response(), util.TEMPLATE_TEST_CONTEXT will be populated with the name of the template as the key and the context supplied as the value.

Config stuff

You might notice that MediaGoblin has two configuration files... one is paste.ini and one is mediagoblin.ini. This is only sort of true, paste.ini is only really used to set up the application to be run by Paste Deploy. Anyway, the focus of this section is on the mediagoblin.ini file.

In short: it uses ConfigObj. There are a few particular features about this...

  • We can set defaults and specify types in the file mediagoblin/config_spec.ini
  • It *is* possible to add custom validators / types. See the ConfigObj and the Validator documentation.
  • you'll notice that there's a whole section in here that's celery related... we can configure celery from the config file.