Rails 2.3: Batch Finding
If you've ever worked with a huge number of Active Record objects and watched your server memory at the same time, you may have noticed considerably bloat in your Rails processes. That's because Active Record doesn't support database cursors (alas!) so all of those records come into memory at once.
In Rails 2.3, ActiveRecord::Base is adding two methods to help with this problem:
[sourcecode language='ruby']
Account.find_in_batches(:conditions => {:credit => true}) do |accts|
accts.each { |account| account.create_daily_charges! }
end
[/sourcecode]
The
[sourcecode language='ruby']
Account.each do |account|
account.create_daily_charges!
end
[/sourcecode]
A couple of caveats: first, if you're trying to loop through less than 1000 or so records, you should avoid the overhead of batches and use something like
In Rails 2.3, ActiveRecord::Base is adding two methods to help with this problem:
find_in_batches
and each
. Both of these methods return records in groups of 1000, allowing you to process one group before proceeding to the next, and keeping the memory pressure down:find_in_batches
is the basic method here:[sourcecode language='ruby']
Account.find_in_batches(:conditions => {:credit => true}) do |accts|
accts.each { |account| account.create_daily_charges! }
end
[/sourcecode]
find_in_batches
takes most of the options that find
does, with the exception of :order
and :limit
. Records will always be returned in order of ascending primary key (and the primary key must be an integer). To change the number of records in a batch from the default 1000, use the :batch_size
option.The
each
method provides a wrapper around find_in_batches
that returns individual records:[sourcecode language='ruby']
Account.each do |account|
account.create_daily_charges!
end
[/sourcecode]
A couple of caveats: first, if you're trying to loop through less than 1000 or so records, you should avoid the overhead of batches and use something like
Account.all.each
or a regular finder to get the records. Second, if the table is very active (i.e., has a constant stream of inserts and deletes), using the batch methods may miss records due to changes in the table between batches (whereas finding all records will at least give you a complete point-in-time snapshot).