Blog from Feb 22, 2012

On technological choices
The following is an unedited reply to an email from a good friend of mine who is desperate to run his own startup. Thought it might be an interesting read to other non-engineers who find themselves in a similar situation.

Hey XXX, somehow our conversation from the other day got stuck in my head. First, I have to say that it looks to me you are putting too much emphasis on technologies. You should let your engineers and the business needs guide your technological choices, not the other way around. It's going to be expensive to rewrite your application later if you need to, but so what, it's going to be much more expensive to change the business if your current one doesn't work, and yet you are still doing it. Unless you want to make it your core competency to implement the solutions using the chosen technologies, you have to trust and rely on your engineers to make the right choices for the given problems. What you primarily need from your engineers is motivation and experience with one of the reasonably good technological choices and leave it at that. It's difficult to predict the future so whether you pick RoR, Django, Play or some other hot technology today, you don't know if it'll be a success anymore a few years down the road.

On the popularity of Tapestry, you asked "so how come nobody uses it?" By many measures, Java is the most popular language in the world but very few think of it as cool or hype it. The hyped Java is today pronounced "Android" or "Scala". Tapestry is a good example of what's left after the hype has deflated. Ruby on Rails is a fairly successful and popular framework, yet it was much, much more popular six years ago than it is today. In the world of Java, Tapestry, Play or Lift are still newcomers compared to such behemoths as Struts (one of the original web frameworks in Java). Java is just popular and old enough that the space is fragmented. The good news is that beyond "just" the web layer, you have libraries implemented in Java for almost any imaginable purpose you can simply take into use. Tapestry, or Java are not always good choices for web applications, because the needs for most of the websites are simple, so a simpler language and a simpler framework will not only suffice, but work better because more people are more productive with it and there may be less room for grave mistakes. For lots of applications you simply don't need all of the power Java or the JVM has to offer, but when you do, you really need it. The classic example for this is a highly threaded back-end application - if you have a need for it, a typical RoR or Python-based architecture implements the web layer in the given language, but the back-end architecture with a different language. Few other languages and platforms besides Java are comprehensive enough to implement everything from web to the database with the the same language (although certain Microsoft language comes to mind as an alternative). With RoR, Python or PHP, you pair it with MongoDB (implemented in C++) or perhaps MySQL (in C++ as well). A Java web framework, you pair with Cassandra, Hadoop, hsqldb or any of the other strong alternatives based on need, all implemented in Java. It makes it all more complex, which is also reflected in the pay grade between a typical Java developer and a php developer, or any of the other languages in between. What makes this even more interesting to you, is that as a business owner you have to consider the costs so the best solution for you might be finding the most inexpensive developers that are highly productive with their language of choice.

*** UPDATE ***: A bit after my email, my friend sent me a link to this great blog post by the Tumblr guys, to which I just had to reply. Again, I've re-posted the unedited follow-up below. Keep in mind that I'm purposefully making some simplifications to make my point clear (after all, I'm talking to a business dude). Anybody who's battled with scalability issues knows that while theoretical numbers might justify more modest hardware, it's all about building for the worst-case scenario with plenty of reserve capacity.

Nicely proves my point, no? :) Anyway, it's a surprisingly honest read about their architecture, and shows how most of the time it's all organically grown. They are getting reasonable results with their infrastructure, but still, paying a heavy penalty for the suboptimally performing web layer - 500 web servers versus 200 database servers, where most of the latter are used for stand-by high availability. In a typical scenario, the bottleneck should almost always be the database layer, not the web layer. Sharding is a way to get scalability out of MySQL, but it's far, far away from the highest performing SQL databases. Now, imagine if you could serve the whole database layer with 20 servers instead of 200.

For an all Java implementation with an embedded database, it's not uncommon to require 90th percentile response time within 10ms with at least 1 million page views a day from a single server. Performance matters (http://docs.codehaus.org/pages/viewpage.action?pageId=190316696).