Complex SQL Queries with Rails

February 18, 2007 1:39 PM

Rails is a powerful framework - it abstracts many of the underlying database concepts so that you never have to think about them. Of course, sometimes it abstracts away too many.

Frequently I run into a case where I want more data in an ActiveRecord than is actually available from my SQL schema. For instance, I often want to include data from another table through a JOIN or sub-SELECT, or maybe I want to be able to ORDER BY data in a table that's in a belongs_to relationship with the current model.

To illustrate why this is useful, let's assume that we have a table, Pet, which has a PetType. Our schema looks something like this:

pet_types Table:

idname
1Dog
2Platypus
3Cat

pets Table:

idnamepet_type_id
1Santa's Little Helper1
2Snowball3
3Brian1
4Fritz3
5Felix3

Given this schema, there are two things I want to be able to do:

  1. Get a list of pets, ordered by their type. But I want to order on pet_types.name, not pet_types.id. (That is, I want them in order of "Cats", "Dogs", "Platypi", not whatever order their ids are.)
  2. Get a list of pet types, including a count of the number of pets which are of that type.

Admittedly, in this contrived example, I could do these both in rails without making any changes to the model and without too much pain. But with more advanced queries, it's much easier to let your database engine do the work. And rails is relatively graceful about allowing you to do this -- you do need to know a little SQL, but you only need to change a single method in your model.

Ordering by a foreign column
Let's say you wanted to display a list of pets ordered by type. We'd like our database engine to do this for us, since this is one of the things it's optimized at. Since ActiveRecord::Base's various find_* methods all ultimately call find_every(), we need only override that method to add a more complex SQL query. We just need to make one change in our model to support this:

In models/pet.rb:

def Pet.find_every(args = { })
  sql = "SELECT pets.*, pet_types.name AS pet_type_name " +
      "FROM pets, pet_types " +
      "WHERE pets.pet_type_id = pet_types.ID"

  sql += " AND " + sanitize_sql(args[:conditions])
      if(args[:conditions])
  sql += " ORDER BY " + sanitize_sql(args[:order])
      if(args[:order])
  sql += " LIMIT " + sanitize_sql(args[:limit].to_s)
      if(args[:limit])
  sql += " OFFSET " + sanitize_sql(args[:offset].to_s)
      if(args[:offset])

  find_by_sql(sql)
end

Now we can say Pets.find(:all, :order => 'pet_types.name') to sort by pet type. This will give us the following:

idnamepet_type_idpet_type_name
2Snowball1Cat
4Fritz1Cat
5Felix1Cat
1Santa's Little Helper1Dog
3Brian1Dog

Including data from another table
Let's say that we wanted to provide a list of pet types and inc lude the number of pets of that type. We could certainly load each pet type, then do a Pet.count() for each type, but it may be advantageous to let the database engine do this for us.

In models/pet_type.rb:

def PetTypes.find_every(args = { })
  sql = "SELECT pet_types.*, " +
      "(SELECT COUNT(pets.id) FROM pets " +
      "WHERE pets.pet_type_id = pet_type.id) AS pet_type_count " +
      "FROM pet_types "

  sql += " AND " + sanitize_sql(args[:conditions])
      if(args[:conditions])
  sql += " ORDER BY " + sanitize_sql(args[:order])
      if(args[:order])
  sql += " LIMIT " + sanitize_sql(args[:limit].to_s)
      if(args[:limit])
  sql += " OFFSET " + sanitize_sql(args[:offset].to_s)
      if(args[:offset])

  find_by_sql(sql)
end

Now, we get a new column - pet_type_count which we can read (but not write to, of course) as if it were part of the table. We can use it just for display, or we can sort on it just like above. PetType.find(:all, :order => 'pet_type_count') will produce:

idnamepet_type_count
2Platypus0
1Dog2
3Cat3

While these are trivial examples, this sort of overriding can become exceptionally useful when you have a lot of data being computed by the SQL server. If you have a model with several belongs_to or has_many relationships, it's a lot easier to let the SQL server compute these sort of details itself, rather than trying to do it in each method of your model.

Ultimately, you can put any arbitrary SQL into your find_every() method, and rails will continue to work as usual, provided you follow these three simple rules:

  1. Don't exclude columns, only include new ones. Rails examines your table's columns the first time you use a model, and it will expect those columns to be in the results of any query.
  2. Fully qualify conflicting column names. That is, say: :order => 'pets.name' instead of just 'name'.
  3. Ensure that you handle the arguments to find_every(). That is, remember to handle :conditions, :order, :limit and :offset. Otherwise you run the risk of breaking other finds.

4 Comments

Comment by Michael June 3, 2007 10:20 PM

Wow, I've been scouring the net for this information for weeks. I'm new to Rails (and Ruby), but this page has enabled me to get an important feature working that I was just about ready to give up on in my web app. Thank you so much for posting this.

Comment by D W August 6, 2007 11:32 PM

A very helpful post. Thanks!

Comment by David Reese April 21, 2008 8:43 AM

Hey, don't do that. When your ruby starts looking like php, it's time to be suspicious!

(Zeroth of all, pet_types should be plural, like all table names, and PetType should be singular, like all models.)

First of all, you can substitute all the sql += stuff with a with_scope. something like

  def PetTypes.find_every(options = {})
    with_scope :find => options do
      find(:all, :select => 'pet_types.*, (SELECT COUNT(pets.id) FROM pets WHERE pets.pet_type_id = pet_type.id) AS pet_total'
      )
    end
  end

But really you should use a counter cache. that is,

class Pet < ActiveRecord::Base 
belongs_to :pet_type, :counter_cache => true 
end

(with the appropriate :pet_count field in the pet_type db schema)

The thing about rails is it's designed to not have to do the ugly stuff, so when you are doing ugly stuff on a daily basis, it's probably not good practice, and there's probably a better way.

Comment by Edward Thomson April 21, 2008 10:37 PM

Hi David,

Thanks for the comment, and the interest.

Unfortunately, you can't execute ActiveRecord::Base's find() method from inside find_every(), or you'll get infinite recursion. As I mentioned, the point of this is to overload find_every() so that find() will do whatever SQL magic you have in mind, without actually having to know about that bit of complexity.

find_by_sql(), however, does a query right to the database, without going through find_every(). (In fact, ActiveRecord::Base::find_every() calls find_by_sql, and the code above was mostly stolen from that.

I admit that my example was perhaps overly trivial, and as you point out, a counter_cache would have been a more appropriate solution to this problem, but I wanted a simple example that didn't get bogged down in a stored procedure.

While I agree that rails abstracts away much of the ugly stuff -- there are times when you need to get down to the database level, whether to execute complex joins, sub-selects or executing stored procedures.

To be fair, there may be a better (or, more rails-y) way to do this, but this has been working for me for a while with some nasty SQL on a rails project.

Thanks again!
-Ed

Leave a comment

Edward Thomson is a Software Engineer at Teamprise, where he develops cross-platform client solutions for Microsoft Team Foundation Server, with an emphasis on Macintosh compatibility and IDE integration.