Scaling

Just read this post about scaling by chad. I have to agree with him. Cheap does not mean easy :) It just costs less. I followed the link to scaling preso by John Allspaw.

Almost all of the current discussions on the net are taking me back to the early days of my work at yahoo!. I guess i am getting old. In anycase, as many of the engineers who come to work at yahoo! notice quickly, we almost never deal with situations 1 or 2 in the presentation. Almost always, we’d have to build applications for scale. Even as early as 1996, yahoo! as a company has solved this very well. In fact, yahoo! scaled at a very low cost compared to the competition.

For the longest time the perception - rightly so - among yahoo! engineers was that RDBMS are not scalable to serve yahoo! traffic. Jeremy Zawodny - back when he was still coding - changed a lot of that. I found this preso on jeremy’s site about mysql scaling

I think his presentation covers all the things you need to know about scaling your site with mysql DBs. After you have passed through all the stages mentioned in John Allspaw’s presentation, and your Master DB is overloaded, you can refer to jeremy’s preso for setting multi-master mode.
I wonder if this multi-master mode works with 2 masters and 10 slaves.
UPDATE:Some thoughts after i posted this. I have to admit that i am no expert at scaling systems. What ever I learnt, I learnt at y! and continue to learn. One really simple principle that really enables systems to scale is partitioning the data. When you think hard about it, all of the data doesn’t *really* need to be in the same database. The quicker you realize about this about your application data and design your data model so that you can partition your data across multiple databases, the easier you made your life.
For example, let’s take John Allspaw preso. He talks about “many box” solution. This solution has only one master writer. Well, if you have more people writing to your database than what one machine can handle - you are SOL. You can partition your data and replicate the “many box” solution for each partition. Of course, you have to have a addressing system on your apache/php front end to know which partition to hit for which data.

3 Responses to “Scaling”

  1. John Allspaw Says:

    ‘many box’ solution: yep, can’t handle the writes, SOL indeed. :)

    a coupla other limitations of that layout: (without partitioning)

    - even if the Master *can* handle the writes, the Slaves may not be able to, and serve read traffic at the same time, no matter how many slaves you have. LiveJournal’s presentation last year (http://www.danga.com/words/2004_mysqlcon/mysql-slides.pdf) covered this scenario really well.

    - having one Master is still a Single-Point-of-Failure

    Jeremy’s book is chock full of great stuff, although I’m adamant about not using RAID5, whereas he still suggests it’s ok. :)

    thanks for reading my slides!

  2. Jeffrey Friedl Says:

    One problem with the many-partition DB is potential skew, with replication and with backups. You have to be very careful to either design the app so that such skew doesn’t matter, or somehow ensure that it doesn’t happen. The latter is probably impossible, though.

  3. Ravi Dronamraju Says:

    Yes. It is important to understand your data before deciding to partition it. The key steps in my mind are
    - Understanding your data to identify partitionable dimensions. Can this data be partitioned by time, location, user, data specific attribute?
    - How are the partitions sized? For example, if you partition data by location, it’s quite expected to have more data in New york partition vs Iowa partition.
    - Having different partitions means that different partitions can also be of slightly different scales. For example you can have more machines serving california partition vs neveda partition
    - Of course, you have to understand your data to make these judgements

Leave a Reply

enter this word: drona