I fell in love with Ruby's OpenStruct class a few months ago after reading "Jay Fields' Thoughts" and "ERR THE BLOG" posts on OpenStruct.
I immediately liked the concept. It was right in line with how easily you can get things done in Ruby. I carefully tucked the OpenStruct nugget into the back of my head for later use. It wasn't too long until I had the opportunity to implement a solution with it. Due to issues and limitations with ActiveRecord::Base (another time, another post) Tobi needed to use a more flexible and generic architecture for reporting (again, another post). Anyway, in order to get this done quickly I pulled out the OpenStruct nugget.
We ended up with a couple of really simple classes to encapsulate our data. Now we could simply extend DataRow and decorate it with the particulars of the specific report.
- # code snippet
- def query
- query = <<-EOS
- SELECT ...
- FROM ...
- INNER JOIN ...
- WHERE ...
- GROUP BY ...
- ...
- EOS
- end
- # code snippet
- def run
- rows = ActiveRecord::Base.connection.select_all(query)
- rows.collect{|row| Backorder::Row.new(row)}
- end
- class DataRow < OpenStruct
- def initialize(row)
- # Allows us to use an anonymous ActiveRecord or a hash
- attributes = case row
- when Hash then row
- else row.attributes
- end
- super(attributes)
- end
- end
- module Backorder
- class Row < DataRow
- def order_number
- order_id.to_i + 1000
- end
- def order_ids
- Array.postgres_to_ruby(orders)
- end
- def purchase_order_ids
- Array.postgres_to_ruby(purchase_orders)
- end
- end
- # ...
- end
The preceding code gave us a simple and generic way to use complex SQL to load from an arbitrary number of tables as well as temporary tables. Additionally, we did not have to pollute models with report specific logic. The testing benefits were another major plus!
This solution solved one performance problem (Rails' n+1 queries) but introduced another one. Unfortunately we didn't catch that one until production :(
The DataRow extending OpenStruct worked exceptionally well with a small working set. However, in production, when real users hit up some of the reports, the working sets became quite large and mongrels started to crash. We have excellent monitoring in place so we were able to quickly recover and the crashes where localized to the back-end enterprise Rails application so the bug was not too terrible. However, the reporting "fix" quickly became a serious issue as getting stuck behind a slow mongrel is an awful user experience. What made matters worse was the fact that the mongrel was crashing and not just taking forever. This meant we had to go in and clean up. All in all, it was a bad situation and a horrible headache.
Why were we crashing? What was going on? Didn't we take a report that would either time out or take 10 minutes due to the n+1 Rails issue down to a singular query? What could it be? Of course the issue was obvious, we were running out of memory because we were creating so many objects. Still, why were we running out of memory? The object count wasn't that high after all.
Turns out that the problem was in ostruct.rb#new_ostruct_member. Every time we instantiated a DataRow it created a new class and then defined the corresponding methods. Of course it does, how did we think OpenStruct worked! Actually, we thought it would define the methods once on the class DataRow that was extending it. Anyway, this was the issue and it was killing us!
- def initialize(hash=nil)
- @table = {}
- if hash
- for k,v in hash
- @table[k.to_sym] = v
- new_ostruct_member(k)
- end
- end
- end
- def new_ostruct_member(name)
- name = name.to_sym
- unless self.respond_to?(name)
- meta = class << self; self; end
- meta.send(:define_method, name) { @table[name] }
- meta.send(:define_method, :"#{name}=") { |x| @table[name] = x }
- end
- end
We needed to quickly code ourselves out of this mess. Here comes the new version of the DataRow and Backorder::Row class.
- class DataRow
- def initialize(attributes)
- @attributes = attributes
- end
- def self.define_row_attribute_methods(row_attributes)
- self.class_eval do
- row_attributes.each do |attribute|
- attribute = attribute.to_s
- define_method(attribute) do
- @attributes.fetch(attribute, nil)
- end
- end
- end
- end
- end
- module Backorder
- class Row < DataRow
- define_row_attribute_methods(
- [:item_id, :order_id, :order_date, :customer_name, :product_name, :brand_name,:color_name,
- :size_name, :gender, :price, :backordered_quantity, :purchase_order_quantity, :received_quantity,
- :outstanding_quantity, :on_order_quantity, :inventory_quantity, :orders, :purchase_orders])
- def order_number
- order_id.to_i + 1000
- end
- def order_ids
- Array.postgres_to_ruby(orders)
- end
- def purchase_order_ids
- Array.postgres_to_ruby(purchase_orders)
- end
- end
- end
Basically we are defining the methods that we need on the class level when we first load the class. DataRow's internal data structure is a hash, so we can easily instantiate it and we still have all of the functionality that we needed from OpenStruct. Because of the loosely-coupled architecture we were able to swap out the general DataRow and specific Row classes.
Note: I still love the OpenStruct class. It is actually a beautiful class with some great Ruby code. It epitomizes why Ruby is superior to Java. I mean it uses recursion, meta-programming, aliasing, and method missing functionality! All of this in about 100 lines of easy to ready and understand code.
Second Note: I tried to keep the code in context as much as possible, this is Tobi's real code and it is test covered.
N'th Note: There is still much more we have done to DRY this up and simplify but it was not germane to how OpenStruct bit us in the ass.
References
http://www.tobi.com
http://errtheblog.com/posts/28-strut-your-structs
http://blog.jayfields.com/2006/09/ruby-stub-variations-openstruct.html
http://www.ruby-doc.org/stdlib/libdoc/ostruct/rdoc/classes/OpenStruct.html
0 comments:
Post a Comment