Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building and Scaling a Startup on Rails: Things We Learned the Hard Way (by Posterous S08) (axonflux.com)
249 points by arjunlall on Feb 22, 2009 | hide | past | favorite | 55 comments


That is a very good article.

I'd suggest absolutely everybody with a web page read up as much as they can about optimizing web page loading. It doesn't matter if you're Posterous or you run your 2,000 daily visitors off a 256 MB VPS which is never under load, optimizing the page can result in 90% decreases in the user-perceptible time it takes to load the page. It doesn't matter if you slave over a hot memcache to save 200 ms off the database queries if your page then takes 7 seconds to render.

Shaving off two seconds, one second, half a second just prints money. Every time I do it I'm flabbergasted by how much it matters.

The HTTP cache control mentioned in the article is one excellent place to start. For the rest, Yahoo pretty much owns this field of research -- any of the presentations from the YSlow folks are worth your time.

http://www.slideshare.net/natekoechley/high-performance-web-...

I'm kind of a lightbox junkie and I've had some good results recently by the simple expedient of having the browser preload the image (e.g. <img src="foo.jpg" style="display: none;" />) prior to the actual user interaction which calls it to display. There are a billion similar site-specific tricks you can do in Javascript these days.


Oh, one more link while I'm at it:

http://www.slideshare.net/stoyan/high-performance-web-pages-...

The presentation I cited in the parent is more motivational and less stuffed with "Here is a checklist of actionable steps that if you implement will make you more money for each one you do."


I hate when people tell me that Rails doesn't scale because I have scaled Rails and it has nothing to do with the framework.

The reason why websites can't scale is ALWAYS the DB. To fix the DB it is always master/slave, memcached, then sharding. For EVERY language.

Good article, sums up a lot of the tools I used to scale my stuff.

I'd like to add a little more.

1. Turn on slow-query logging in your DB and tail -f the slow query log. Find slow queries and kill them in your code through indices or using several fast queries to make up for 1 long-one.

2. Cache most reads. Both from application caching and memcached.

3. Turn off associations, they don't play well with caching.

4. If you're using memcached, don't use the plugins.

5. If you're a serious startup, use Engine Yard, they're life savers.


Rails running on vanilla Ruby does have particular scaling issues due to the limitations of the Ruby garbage collector. You hit a GC cycle after every 8MB of allocation, which typically takes around 150ms to run, so can dominate the runtime of your requests. We've had 80% of the runtime in GC, even when you include database time.

This isn't necessarily a killer, but it means there's sometimes more work than you might like in scaling. There are patches to tune the garbage collector, and JRuby etc. may help, but ultimately you need to be much more aware of memory allocation than you might think.


That's an application bottleneck which is fixed by more application servers. This has no effect on scaling what-so-ever.

And by scaling here, we're talking about 1 M to 10 M, not 10,000 to 100,000.


"5. If you're a serious startup, use Engine Yard, they're life savers."

- If you're a serious startup that doesn't monetize off of display ads, use Engine Yard. Otherwise, you can't afford them.


If you're a serious startup, you're not monetizing off display ads.

:)


Actually, this is generally good advice regardless of what language you're on. The database is almost always at fault. My general rule is that a query should generally take less than 0.001 seconds to execute. If you can get it down to that through proper indexing and whatnot, MySQL will execute the queries fast enough so that they don't get backed up and eventually kill the server. Don't start caching until you can write a proper query. Your site probably doesn't need it if you are writing your queries correctly.


... and be less afraid of de-normalizing your data!


Pretty good run down. Actually, a very good run down.

Sphinx is cool, but there's no reason to be afraid of solr/lucene. It takes _very_ little java knowledge to get it up and running and it's very very, really, crazy fast. Like, hundreds of thousands of searches a day on millions of documents and it's totally stable. knocks on wood

Also, Passenger is indeed better than all this mongrel + god + tweaks they're talking about it. AboutUs.org (my employeer) is the largest site on Passenger (that we can find) and we've had 2 actual crashes in the last 6 months, and they were fixed with rebooting Apache.


From my experience (and I have used Sphinx, Lucene, Solr and Xapian in production) is that Lucene/Solr have a pretty bad perfomance compared to Sphinx or Xapian.

My Lucene setup began to throw deadlocks and memory exceptions pretty early on. Searching on "deadlock lucene" on Google yields 25000 results. I have later rewritten the system to Xapian where it has run without any problems.

For live updates I would recommend using Xapian. For fairly static indexes I would recommend Sphinx (as it's _really_ fast for both indexing and searching, but it does not support live index updates yet).


Really? What size of index/documents where you doing searches on?

We're doing live updating, probably close to 1 update per second.


Thanks for the suggestions! Will probably explore Solr down the road. I've written search on top of Lucene before, just didn't have time to work too hard on getting search going in between all the other features we're trying to push out.

Passenger I have heard great things -- been meaning to try since it does fix the mongrel queue problem.


> You're only as fast as your slowest query....

Use HAProxy. Take the time to configure it correctly. It's absolutely the most stable and useful loadbalancer I've used. It also solves this issue by talking to your backends.


I would highly recommend keepalived, which is well-written software for managing ipvs, a load balancer in the Linux kernel. Very, very fast, flexible, and extremely reliable.

EDIT: ipvs/keepalived only operate at layer 3, which is one reason why they are so fast. if you need layer 7 stuff, this won't do it.


We use pen, which can be configured to avoid this issue as well. Just specify the maximum number of connections to be the same as the number of backend servers (via the -x option), and it will queue connections beyond that number and hand them to the first backend that frees up.

We've had zero issues with pen, it's been rock solid, but we're still looking at moving to Passenger at some point: seems like it will be more flexible and efficient than our current pack of mongrels.


I attempted to switch to pen when I was having issues with Perlbal. It was segfaulting under light load (light for me is heavy for most people) comparable to Perlbal, so I switched back. I can't say I reccomend it at all.


Sad town happens. 1 in 4 requests after Request A will go to port 8000, and all of those requests will wait in line as that mongrel chugs away at the slow request...

I don't use Rails, or even Ruby at all if I can avoid it, so I'm sure I'm missing something obvious here, but... why in the world would anyone want to use a web server which can only handle one request at once?


It's not the webserver, it's Rails; prior to 2.2 it used a Big Giant Lock around the dispatcher, so it would serialize requests.

Of course, even without that Ruby itself will only use a single CPU, since the interpreter itself has a Big Giant Lock, but it can still use threads to multiplex requests and avoid wasting time waiting on every IO.

JRuby allows for proper concurrent multithreaded request handling; I'm surprised it's not a more popular deployment option.


Sorry, what's the platonic ideal you're suggesting as an alternative? There are threaded Ruby web servers, but, just like with Python, the interpreter is mostly giantlocked. The overwhelming majority of web apps out there are running under Apache, which just like Mongrel is preforking and queueing, not running everything simultaneously.


If you have N requests hitting Apache and one of them is slow, that one slow request will run in its own process while the fast requests are sent off to other processes. The fact that each process only handles one request at once is irrelevant.


Uh, this is how Rails setups work too. You aren't talking directly to Mongrel.


Maybe I misunderstood the article -- it sounded to me like requests were being distributed between Mongrel processes and queued on the individual processes rather than being queued centrally and only allocated to individual processes when a process is free (like Apache does).


That's correct -- the load balancer passes the traffic directly to Mongrel, and each mongrel has its own internal queue / mutex.

That's why you run like 4 or 8 or 12 or N many mongrels to handle additional load.


The problem with Mongrel is that you allocate N mongrel instances at setup time. Apache, on the other hand, can dynamically allocate new processes (up to a limit) in order to meet increased demand. This is especially important for people like me, who host more than one site on a machine, and want to be able to handle load up to a certain point without fiddling with config files every time there is a spike in traffic.


"Sorry, what's the platonic ideal you're suggesting as an alternative?"

Something like Yaws, built in Erlang. With fine-grain threading that works well, you don't get the one-to-one OS process to task mapping.

But that certainly comes with its own set of tradeoffs. There are some workloads where that can be a massive win, but committing to any of the currently still-obscure languages/runtimes that can pull this off with panache means you're committing to a less-well-developed library environment.

Facebook chat runs on Erlang for a reason... and the rest of Facebook runs on PHP, also for a reason.


I get that feeling too every time I read an article like this.

I develop 90% of my stuff on the Microsoft stack, and it just plain handles anything you can throw at it out of the box. We see 2,000,000 pageviews a day steady state, and very seldom lift the CPU off of zero, with nothing more than bread and butter index and stored procedure tuning.

With that as a baseline, I just don't see why all these little startups are having such problems just keeping their website up under a little traffic. I can't believe that these scripted ORMs are really that inefficient that you'd need to spend this amount of effort bolting 3rd party stuff onto them just to keep them alive.

There has to be something else going on. What, exactly am I missing here?


Mongrel is multithreaded, solid, and fast, the issue was with, which Rails wasn't thread-safe until a little while ago.


Great post with obvious real world experience behind it.

I think it's a fantastic point that you should focus on optimizing your database before you start adding caching. If you can tune your DB with the right indexes and give it enough RAM to fit the whole dataset, you've got a great cache right there!

(BTW, I have been using PostgreSQL on my latest project. I'm impressed so far. It has a much query optimizer and better indexes than MySQL.)

I also like using Solr/acts_as_solr. I haven't used Sphinx but from what I've read about setting it up it sounds incredibly fiddly. Solr, by contrast, is quite simple.


Have recently switched from using Solr+acts_as_solr to Sphinx+Thinkinh Sphinx I have found quite the opposite. Thinking Sphinx is simpler than Solr and required no 'fiddling'.

I had to work around some problems in acts_as_solr, such as it's tendency to automatically update solr as soon as a DB record changed (regardless if no fields that solr is indexing have changed). Thinking Sphinx simply updates it's index in a batch process that is called by cron (and is super fast!). I highly recommend it.


Sphinx isn't hard to set up, but (from what I've heard) Solr is a much better choice if you want very fine tuning. Sphinx is a great out of the box solution and will fit most people's needs quite well.


Rails scales just fine - the problem is that it's just really expensive to do it.

Here's some of our learnings:

1. Use query_trace to trace your DB calls and query_analyzer to automatically run EXPLAIN on each call:

http://github.com/ntalbott/query_trace/tree/master http://github.com/jeberly/query-analyzer/tree/master

2. Use our patches to mongrel proc_title to troubleshoot slow queries: http://asemanfar.com/Request-Queue-via-Mongrel-Proctitle

3. Don't use ActiveRecord.

4. Don't use any link or url helpers in Rails.

5. For that matter, don't use Rails. Rewrite your most hit components in something faster.

I love Ruby, but the simple truth of it is that we'd be saving a couple of engineer's worth of money if we weren't on Rails.


I dunno about saving a couple engineer's worth of money -- that's why Rails is massively viable in the first place.

You absolutely can iterate faster with fewer people and less wasted time.

Servers are cheap. People (salaries) are expensive.

(Well.. until servers become expensive due to straight up load/scale, that is.)


Right now we're estimating that our Rails premium = 4X the salary of a San Francisco engineer. But we do have a lot of servers.


I don't know jack about Rails, but there is some good general advise here too. I would have liked to see some DB commentary that didn't choose MySQL as a foregone conclusion. I can't think of many instances where I would recommend it in general.


May I ask what you do use exactly? MySQL has always been my goto for simple db needs. What do you normally use and what's your "general" cases where you wouldn't use it?


For "simple [rdms] needs" I would recommend SQLite. For anything more, PostgreSQL. I find MySQL too buggy and it diverts from the SQL standard too often (or doesn't implement enough of it) for my tastes. A properly configured postgre install (granted, not exactly trivial) will perform at least as good if not better than MySQL and its advanced functionality is extremely mature and robust, unlike MySQL's which has largely been tacked on in the current major version (views, templates, triggers, etc.)


Great article.

Background/deferred job processing has been immensely painful for us, and I have no idea what the accepted best job queue is. We're still stuck on backgroundrb, which is a nightmare. Nanite looks like overkill, and has too many deps.


Workling has worked well for us -- Rany Keddo is a phenomenal open source guy.

Tobi's deferred jobs also looked totally solid to me. That one's proven because it drives Shopify. We use his liquid plugin, which is also stellar.

That underlines a big issue with Rails dev even today -- its really hard to know what's good / what works, and what is just some weekend project for someone.


Delayed-Job works great for Lighthouse/Tender from us, and for Github too. It's definitely a great starter queue until your needs necessitate something more heavy duty like nanite.


Found Tobi's delayed_job (not deferred) --- I think I like the design better than Workling; fewer moving parts, just a database backend end a rake script. The biggest problem we have with backgroundrb (which again: nightmare) is not really ever knowing the state of current running jobs.


yeah backgroundrb sucks, who wrote that shit? (me and i wish i never did)


(oh, and thank you).


We're currently using ActiveMQ with ActiveMessaging as the Rails bindings. One reason for this was that we wanted something which would be fairly language/platform neutral if in the future we decided that Ruby/Rails wasn't the right choice for certain bits of the app.

RabbitMQ is another option to consider in this area.


We simply use an in-memory queue. Beanstalkd has worked quite well for us. (Not a rails app though)


Beanstalkd + async_observer (Rails plugin) have worked well for me. Add a Munin script to monitor the queue size (http://gist.github.com/36116) and it's a pretty effective and manageable system.


Huh, I've been quite happy with backgroundrb lately, both for repeating (cron-style) background work and for deferred processing. We've had little trouble tracking worker status once we switched to the memcached-based result cache, which allows the app to interrogate workers however it sees fit...


I'm 'stuck' on backgroundrb too. Until now I thought it was the default and best way to run processes in the b/g for a Rails site.

I'm eager to see if anyone else can recommend alternatives that they've had good experiences with.


We're using Background Job (bj) quite happily, though we've been told that Delayed Job (dj) is the new hip thing to use in its stead.


the async plugin is quite nice. you can replace

>> myobject.very_slow_method

with

>> myobject.async :very_slow_method

and it's executed in background

http://github.com/lassej/async/tree/master


Great post, you discovered a bunch of the gotchas faster than we did on our first rails site. I wish I would have had a post like that a year ago. Thanks


Can someone shed some extra light about the point re: reducing the number of requests to the DB for a dynamic page. When left unoptimized, Rails (and a lot of other frameworks) often result in 100 DB queries for a page.

One obvious way around this is to ensure the DB joins are done correctly. But the article mentions batching/grouping up the requests. How does that work?


Check out the :include parameter to find method calls in the ActiveRecord documentation.

Say you have 30 blog posts, and your views reference associations to the blog post's owners. Well, views are dumb and if you call post.user on each one within a loop, you end up calling User.find 30 times.

But if you do Post.find(..., :include => [:user]), Rails will know to eager load all users -- and User.find never gets called 10 times.


Great article! BTW, I was wondering what are the testing strategies used and how did you guys ensured smooth integration of features?


Rspec, Cucumber, Integrity, and a custom end-to-end testing process that runs using the daemons gem that assures all critical aspects of the site runs at all times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: