Includes vs Joins in Rails: When and where?
Article Includes vs Joins in Rails: When and where? For the past few months I’ve been hiding away in a cave and working intensely on a not-so-secret project, Trado. So I thought I’d reach out once more to my fellow interwebbers, and share some knowledge I’ve learned on my journey ...
Article Includes vs Joins in Rails: When and where? For the past few months I’ve been hiding away in a cave and working intensely on a not-so-secret project, Trado. So I thought I’d reach out once more to my fellow interwebbers, and share some knowledge I’ve learned on my journey so far.
Now with any e-commerce platform there is bound to be a lot of database relations and models relying on each other for, you guessed it: data, and with many relations come many performance aches and pains. So like any good developer you turn to the Ruby on Rails documentation and stumble upon two very similar yet very promising Active Record methods: includes and joins.
What is the difference between includes and joins?
The most important concept to understand when using includes and joins is they both have their optimal use cases. Includes uses eager loading whereas joins uses lazy loading, both of which are powerful but can easily be abused to reduce or overkill performance.
If we first take a look at the Ruby on Rails documentation, the most important point made in the description of the includes method is:
With includes, Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.
In other words, when querying a table for data with an associated table, both tables are loaded into memory which in turn reduce the amount of database queries required to retrieve any associated data. In the example below we are retrieving all companies which have an associated active Person record:
@companies = Company.includes(:persons).where(:persons => { active: true } ).all @companies.each do |company| company.person.name end
When iterating through each of the companies and displaying the persons name, we would normally have to retrieve the persons name with a separate database query each time. However, when using the includes method, it has already eagerly loaded the associated person table, so this block only required a single query. Awesome, right?!
So what happens if I want to retrieve all companies with an active associated Person record, but I don’t want to display any data from the Person table? It’s starting to seem a tad overkill loading the associated table…well that’s where the joins method starts to shine!
If we use the above example again, we can start to see how easily people can become confused between the includes and joins method, when very little has changed:
@companies = Company.joins(:persons).where(:persons => { active: true } ).all @companies.each do |company| company.name end
Visually the only difference is replacing the includes method call with joins, however under the hood there is a lot more going on. The joins method lazy loads the database query by utilising the associated table, but only loading the Company table into memory as the associated Person table is not required.Therefore we are not loading redundant data into memory needlessly; although if we wanted to use the Person table data later on from the same array variable, it would require further database queries.
I’m not convinced, I need some stats…stat!
Recently I fell victim to not using the awesome power behind the includes method in my Trado codebase. I noticed a severe performance leak when monitoring the database queries in my local instance server logs, which is a habit I would advise starting.
The following code in my index method was producing an abnormal amount of database queries, as seen below:
def index @shippings = Shipping.active.all respond_to do |format| format.html # index.html.erb format.json { render json: @shippings } end end
ActiveRecord: 265.2ms
(The database query list was so big I couldn’t fit it in the screenshot!)
As you can see, for every row in the table, it was making two database queries to grab data for the associated zones and tiers tables. When scaled up this starts to become a heavy load on resource with an Active Record loading time of 265.2ms. So in light of preserving scalability and performance in my application, I modified the index method to take of advantage of the includes method for the zones and tiers table associations:
def index @shippings = Shipping.active.includes(:zones, :tiers).all respond_to do |format| format.html # index.html.erb format.json { render json: @shippings } end end
ActiveRecord: 2.8ms
You can quickly see here that the number of database queries has been reduced to an optimised number of just 5, which in turn drastically reduces the Active Record loading time to just 2.8ms – that’s a 99% reduction!
I hope this has been been helpful in explaining the pros and cons of using the includes and joins association methods, and helps you on your path to producing scalable, high performing applications!