Earlier today, TechCrunch’s poorly researched claim that Twitter is abandoning Ruby on Rails in favor of PHP or Java generated a lot of buzz in the Twitter and Ruby communities (the claim was later refuted by Twitter developer Evan Williams).
Of course, the article’s comments attracted the usual, ignorant TechCrunch trolls. Most took the opportunity to pitch their framework of choice (such as PHP, Java, .NET, or Django), which they claimed would of course magically solve all of Twitter’s scalability issues.
I have to say I am appalled at this level of ignorance. People just don’t seem to realize that Twitter is a complex messaging application and that the front-end is only a relatively small aspect of it. Even if one particular front-end technology happens to be faster than another one (and admittedly Rails, as much as I like it, is not the fastest technology out there), this fact is bound to be negligible compared to the real challenges in scaling the back-end, starting with the database (trust me, like many developers I’ve learned this the hard way ;) ). Even for a typical web application (which Twitter is not), there are many performance improvements than can be implemented at that level (such as leveraging database replicas to separate writes from reads, or utilizing Memcached to cache queries and other data), all of which can be applied equally well to any front-end framework.
I’m not saying that it does not make sense to consider other technologies (there might very well be a breaking point at which it makes sense to evaluate Java or even rewriting parts of the system in C/C++), but in my opinion this should be considered a cost-savings measure when the application reaches a scale at which the cost of hardware far outweighs any savings due to increased developer productivity (think Google), and not a magic bullet for solving fundamental scalability issues (performance != scalability!)
One of the real difficulties in scaling Twitter lies in the fact that all Twitter hits are completely personalized and need to return fresh data, making it difficult to fully leverage caching. Also, since Twitter is a social application and the returned data is generated by each user’s social graph, there is no straightforward way to shard the database by user, as one might be able to do in a typical e-commerce or enterprise application (or pretty much any non-social app…). Without knowing more about Twitter’s internal architecture and their actual profiling results, it would be foolish of me to make any concrete recommendations – particularly silly ones like “Use technology XYZ, it will magically solve all your problems!” Too bad many of the developers out there don’t seem to realize this…