N+1 Select

#n+1-problem #orm #database #performance

You can forget about the performance of synchronized vs. unsynchronized, when you write code hitting your database that looks like this:

peeps = Database.findMyPeeps  
for (each peep in peeps)  
  address = Database.findAddressForPeep(peep)

This is called an N+1 select, although honestly it is more like a 1 + N select. You first run a query to find what you are looking for, and then you proceed to iterate over those results to do N more queries for each row in your initial query. This is a all too common reason for slow applications because of the overhead of the frequent database queries. My example above is the obvious version of this, and if you use an Object Relational Mapping (ORM) such as Hibernate or ActiveRecord there is a much more stealthy version of this anti-pattern:

peeps = PeepDAO.findMyPeeps  
for (each peep in peeps)  
   address = peep.getAddress

If Address is a table with a foreign key relationship to Peep, and the mapping is defined as a lazy load, then it will result in the same N + 1 Select problem. How do you solve this?

If you said by changing the mapping to an eager load… WRONG!! First, run the code and see the queries that your ORM is making. Get a representative set of data and time how long the section of code takes to run. Only once you have in this in place are you in a situation where you are capable of making smart optimization choices.

The first solution is to change the mapping for Peep to eager fetch addresses. This might be the right solution, but make sure you understand everywhere that Peep is used first. Changing the mapping will change the behavior of Peep everywhere it is used. You might be creating a local optimization where you speed up your for loop in this case, but pull down unnecessary data in 80% of the rest of the program.

The second solution is to write a custom query. In Hibernate you can use either HQL or the Criteria API. Most of the time this is going to be the superior solution to your N+1 Select problem.

Can anyone tell me in what situations they would use HQL vs. using the Criteria API?