I had a talk with Evan Prodromou about how to scale "subscription" style feeds. Here's the whiteboard he drew and the description of how it all worked.
I'm going to describe things in stages, because that's the way he described them to me, but it would be good if we did it the right (final) way from the get-go, probably. It sounds like we can retain a lot of hair by listening to this advice. :)
Evan says "we have a fairly normalized SQL setup here..."
The first row here is the "initial stage" of how statusnet did profile subscriptions, and is the most absolute minimal setup you need. It contains several things:
- A profile table. This is separate from a users table (profile table contains both local and remote "profile" subscriptions; user contains the information for like passwords, etc.)
- A subscription table, which shows who subscribes to who
- A notice table that contains the actual notices being posted (would be media entries in our case)
The original StatusNet way of doing things was to simply do a join across these three tables. This worked for a while, but eventually it started to "become total murder" to get updates, taking significant numbers of seconds. So they moved on to another solution...
The second row shows the table that was added for this solution, which contained each notice pointing at each user, and you'd do a query just grabbing a particular slice for a profile (on the left) and the notice (on the right).
Apparently this also worked for a while, but then became unusably slow. Which meant they went onto the final solution which was...
A table that has two columns: a profile id, and a packed blob that contained the 1024 latest notices. Since 1024 notices times integer bitesize is how much a single page of MySQL data is, in a single short query you could retrieve the first 50 pages of updates. Beyond 50 pages (most users never click that far), just use normal slow queries.
Evan commented that they only do this for profiles right now (not tags and etc) but they've thought about doing it, and this also doesn't work at all for a "frontpage firehose" type situation.