Twitter Achitecture
This is my two cents on improving Twitter Architecture: 
I work for a mobile service provider in PH and a bill payment company in AU. I feel empathy  towards twitter because I've been on the situation before since I also work on a high-availability and high-traffic systems as well. 
Although most of these suggestion is based on  my work for on  machine-to-machine messaging, I think the following suggestion would work for Twitter.
DB Sharding: Current twits (1-4 weeks old) has higher probability to be access via web/API than older twits. Use Sharding by date (eg:monthly) so it is possible to allocate old twits to a lesser priority (less likely to take up memory as DB index). The project HSCALE has an excellent presentation slide explaining how sharding is implemented.
Decentralize: Twitter is a little bit like blogs but a lot simple, however its like a messaging system too. Therefore proposed solutions that treats Twitter as a messaging system will not completely solve the problem. Like a patient in a hospital, assuming that all symptoms observed are related to a single disease are often fatal.
Twitter is unlike IMs which basically has a 1-to-1 or many relation in terms of message delivery and that the largest group of recipient will always be finite. Twitter on the other hand suffers from 1-to-infinite relation! Because of this it is crucial that all data reads must be decentralized, which is the same way how blogging system works.
API on reading data for public accounts should be done via RSS only such that it can be redistributed evenly by aggregators likew FeedBurner. Of course there is a sacrifice, the system will lose its just-in-time message delivery. 
Anyway, even Google can't crawl the net and display changes to search results that fast; why should twitter even try?
Lastly, Data localization: Take advantage of emerging technology such are Googe Gears which allows data persistent to be localized on the user's computer. This greatly reduces the need to transfer data, lessen impact caused by DB reads and boost usability by improving UI responsiveness.
 





0 comments:
Post a Comment